Tuesday, August 28, 2007

The Top 100 Words on Digg

As I mentioned in an earlier post (Stories with Longer Titles get More Diggs), I used the Digg API to download titles and # of Diggs for over 230,000 stories. In this post, I continue my analysis of this data with a list of the top 100 words on Digg. To make this list, I first made a list of every word that occured at least 25 times in all 233,570 story headlines. Then I sorted list by the average number of Diggs recieved by a story with a title containing a particular word. The result is a list of words that appear in the titles of stories that got large numbers of Diggs. Without further ado, here is the list:

comcast programmer craziest undercover bittorrent positions riaa accidentally impossible graph desk strangest excuse slip worried youll creatures directors abandoned buildings boyfriend ton wiretap proposed hated lol defcon oops loser censors dominate dx unix dealers lolcat phishing rubiks served russias offered norton sec bullshit glass whether lowers sit grave editing wiretapping schwarzenegger cave grocery mb liquid loud dumb ninjas branch teenage aluminum owns discovers cheney dry commit marijuana slavery fool giulianis admin impeachment tried disturbing cookies mysteries gonzales reagan cpu loses built gorgeous levitation impeach diggcom futurama couch shops objects permission throws netflix cheneys damaged patrick quebec catching photograph alberto removing