/cdn.vox-cdn.com/uploads/chorus_image/image/44318274/usa-today-8090674.0.jpg)
Between them, the thirty Major League Baseball teams have about 12 million followers. That's a lot, and some teams (read: Yankees) have more than others (read: Marlins). Not all of these 12 million accounts are real people, however, and according to at least one metric, the number of accounts that are active is under 4 million. After doing some spot checking, I'm fairly confident in saying that some of these accounts were never "real," but that doesn't mean that any team has been deceptive in its Twitter practices.
I tried two web sites to check on the composition of the followers of team accounts. The first, at twitteraudit.com, spit out some total numbers for "real" and "fake" followers. The free version of that service doesn't re-audit any accounts -- it gives you the results of previous audits, if they've already been done. That means that the numbers I got from that site are from wildly different times -- some within a month or so, and some over a year old. For that reason, I present them for context only.
Team | Fake | Total | Percent Fake |
---|---|---|---|
BAL | 64365 | 192135 | 33.5 |
BOS | 185914 | 540448 | 34.4 |
NYY | 332237 | 834766 | 39.8 |
TBR | 61089 | 195799 | 31.2 |
TOR | 274490 | 499072 | 55.0 |
CWS | 89398 | 229226 | 39.0 |
CLE | 58623 | 203551 | 28.8 |
DET | 171871 | 470880 | 36.5 |
KAN | 40977 | 187109 | 21.9 |
MIN | 70020 | 242284 | 28.9 |
HOU | 53111 | 169144 | 31.4 |
LAA | 41043 | 180807 | 22.7 |
OAK | 51068 | 182387 | 28.0 |
SEA | 86421 | 228024 | 37.9 |
TEX | 114509 | 424109 | 27.0 |
ATL | 177339 | 554183 | 32.0 |
MIA | 40451 | 137121 | 29.5 |
NYM | 109410 | 291759 | 37.5 |
PHI | 274369 | 757926 | 36.2 |
WAS | 58183 | 200630 | 29.0 |
CHC | 56292 | 222500 | 25.3 |
CIN | 102307 | 225768 | 40.0 |
MIL | 63530 | 191354 | 33.2 |
PIT | 83910 | 296500 | 28.3 |
STL | 224043 | 537274 | 41.7 |
ARI | 43463 | 159559 | 31.0 |
COL | 44809 | 145484 | 30.8 |
LAD | 187282 | 610038 | 30.7 |
SDP | 31880 | 141064 | 22.6 |
SFG | 251006 | 707060 | 35.5 |
Again, one probably shouldn't compare teams against each other with the numbers above; they were collected at different times. In addition, twitteraudit.com uses a sampling method we can't review (though they say 5,000 follower accounts per audit), and on top of that, the results can only be as good as the site's algorithm. The site says that it uses "number of tweets, date of the last tweet, and ratio of followers to friends" in its math ("friends" means reciprocal following, I believe), but not how it does so.
A better tool may be that of Rob Waller's StatusPeople, which generates something called a "Faker Score." It looks at follows-to-followers ratio; Waller notes that most normal users have a ratio of 2:1 or 1:1, but that for fake accounts, the ratio can be "way out of any normal range, between 100:1 to 10:1." It also zeros in on accounts' content (mostly retweets, or only tweets with no links). The tool also checks whether or not users' Twitter bios are filled out properly.
The twitteraudit.com tool uses "date of the last tweet" to help determine whether an account is real, but the Faker Score tool does not. Perhaps the best feature of Willer's product is that it doesn't just separate "real" and "fake" — it has a third category for "inactive" followers. And that's fortunate for us, because it seems that "inactive" followers are collectively the biggest reason for the bloated fake follower percentages above. Waller was very gracious, and gave me the ability to generate "Faker Scores" for all thirty MLB team accounts (all run within a day of November 25th):
Team | Fake% | Inactive% | Good% | Total | Date |
---|---|---|---|---|---|
BAL | 11 | 49 | 40 | 279625 | May-09 |
BOS | 10 | 49 | 41 | 895877 | May-09 |
NYY | 10 | 53 | 37 | 1268456 | May-09 |
TBR | 19 | 56 | 25 | 200156 | May-09 |
TOR | 4 | 67 | 29 | 553258 | May-09 |
CWS | 14 | 55 | 31 | 234167 | Jul-09 |
CLE | 16 | 53 | 31 | 257050 | Jul-09 |
DET | 7 | 58 | 35 | 531480 | Apr-09 |
KAN | 10 | 44 | 46 | 296119 | Apr-09 |
MIN | 19 | 54 | 27 | 246743 | May-09 |
HOU | 25 | 43 | 32 | 171285 | Jul-09 |
LAA | 12 | 53 | 35 | 235242 | May-09 |
OAK | 16 | 56 | 28 | 219171 | Jan-09 |
SEA | 22 | 51 | 27 | 230740 | May-09 |
TEX | 14 | 58 | 28 | 448353 | May-09 |
ATL | 14 | 51 | 35 | 557063 | Feb-09 |
MIA | 23 | 36 | 41 | 137867 | Jul-09 |
NYM | 20 | 49 | 31 | 296102 | May-09 |
PHI | 10 | 62 | 28 | 884987 | Jul-09 |
WAS | 17 | 52 | 31 | 222752 | May-09 |
CHC | 10 | 55 | 35 | 412067 | May-09 |
CIN | 16 | 56 | 28 | 345478 | Apr-09 |
MIL | 16 | 54 | 30 | 229000 | Jul-09 |
PIT | 16 | 51 | 33 | 297669 | May-09 |
STL | 9 | 52 | 39 | 546975 | Jul-09 |
ARI | 18 | 58 | 24 | 161729 | Apr-09 |
COL | 19 | 53 | 28 | 158202 | Jun-10 |
LAD | 11 | 53 | 36 | 614649 | Mar-09 |
SDP | 22 | 48 | 30 | 142854 | May-09 |
SFG | 9 | 45 | 46 | 710918 | May-09 |
Of the thirty accounts, the one with the earliest "joined" date is, perhaps not surprisingly, the Oakland Athletics. They joined in January 2009, and were soon followed by the Braves in February and the Dodgers in March of the same year. Then it seems like some kind of memo went out at the end of April or in May, because four more teams joined in April, and another fifteen joined in May 2009. Then maybe there was some kind of reminder memo, because despite no "joined" dates in June that year, another seven teams joined in July. If anyone has an explanation for why the Rockies account has a "joined" date of a year later (July 2010), I'd love to hear it (their PR account has a "joined" date of May 2009).
The joined dates may matter because now, almost six years after most team accounts started up, most have been around for more or less the same amount of time. And that's the single biggest reason, I believe, why such a small proportion of teams' Twitter followers are active users; stick around long enough, and you'll rack up "followers" who have abandoned Twitter but whose accounts are still around. According to StatusPeople, 48% of the followers of the @mlb account (join date December 2008) are inactive, with just 8% designated "fake." I checked on Slate's account @slate because it had a very early join date (June 2008), and StatusPeople says 50% of Slate's over 1.2 million followers are "inactive." I also checked the mother of all accounts, @twitter (joined February 2007), and found that 62% of that account's followers are considered "inactive," with another 7% designated "fake."
I asked Waller if the high inactive percentages for the thirty team accounts could be largely explained by how long they have been around, and he agreed. But he raised another point: many Twitter users actively read tweets, without writing any tweets of their own; "a lot of Twitter users just use it as a news source, they don't engage," according to Waller. "So logically popular accounts will have a lot of inactive accounts because they are producing content that 'data consumers' are interested in."
Inactive accounts aside, there still appear to be about 1.5M followers among the thirty accounts that are from "fake" Twitter users. But even then, there's an explanation for why we should not hold teams responsible. "There is definitely a tendency for spam accounts to follow popular accounts," said Waller. "It could encourage people to follow the spam accounts via their association with popular accounts."
In poking around myself and finding accounts that were clearly "fake," I often found that such accounts also followed an array of other popular accounts from the same geographic area, including other sports teams and newspapers. This could help make the accounts look like they belong to the matching geographic area -- which may assist in the selling of follows to other accounts. It may even be the case that Twitter itself suggests MLB Twitter accounts to newly-created "fake" accounts, since Twitter's suggestions seem to be keyed to what Twitter believes is the user's geographic area.
Any "popular" account may have a ton of followers that could be reported by a tool like twitteraudit.com to be "fake." But these popular accounts, including MLB team accounts, may have a number of "data consumer" followers, and may have accumulated a number of "inactive" accounts just by virtue of how long the account has been around — and many of the truly "fake" accounts may not have had the cooperation of the "popular" account holders. There's a real market for "fake" followers, and they can be a helpful way to give a new brand a kick start. With respect to MLB team accounts, however, we have every reason to think that there are no shenanigans being pulled.
. . .
The author wishes to thank Rob Waller and StatusPeople for answering his questions and granting him access to run all thirty MLB Twitter accounts through Faker Score.
Ryan P. Morrison is a writer and editor at Beyond The Box Score. He writes about the Arizona Diamondbacks at Inside the 'Zona, and talks D-backs and sabermetrics with co-author Jeff Wiser on The Pool Shot. You can follow him on Twitter (@InsidetheZona), but he makes no claims about the provenance of his existing followers.