The following results are part of a research paper at the Web Science track of the World Wide Web conference 2014 in Seoul, South Korea:
Philipp Singer, Fabian Flöck, Clemens Meinhart, Elias Zeitfogel and Markus Strohmaier | (pdf download)
"Evolution of Reddit: From the Front Page of the Internet to a Self-referential Community?"
For this study we examined the evolution of the link-sharing community reddit.com from 2008 until 2012 by analyzing all submissions during that time period (close to 60 million submissions). In particular, we looked at (a) how user submissions to reddit have evolved over time and (b) how redditors’ attention and their perception of submissions have changed.
Since its founding in 2005, reddit has grown into one of the largest online communities on the web. Today, reddit.com has more than 112 million unique visitors from over 195 countries each month.¹ It is ranked by alexa.com as the 69th and 27st most popular website in the world and the U.S., respectively.²
We analyzed all submissions from Jan. 2008 to Dec. 2012, crawled through reddit’s API. For each submission, the timestamp, title, author, up- and downvotes, number of comments, and the link or text it contained were collected around 1-2 months after the initial submission. Overall, we analyzed 58,874,22 submissions in 125,662 distinct subreddits from 4,910,850 different authors linking to 1,841,239 distinct domains on the Web. Hence, to the best of our knowledge, this work represents the most comprehensive longitudinal study of Reddit's evolution to date studying both how (i) user submissions and (ii) the community's allocation of attention have evolved over time.
Our data analysis reveals two main findings: (i) a strong, ongoing diversification of topics (i.e., subreddits) coupled with (ii) a simultaneous concentration towards a few selected domains and types of submissions (self and images) that seem to feature mostly user-generated content. This suggests that reddit has evolved from the “front page of the internet” (i.e., a gateway to external websites) to an increasingly diverse but self-referential community that focuses on and progressively reinforces its own user-generated content over external sources. We also suspect that posts by users increasingly use references from the “reddit culture”, such as memes or inside jokes. As a bottom line, reddit seems to be increasingly more occupied with itself than with the outside web.
We counted the submissions to all of reddit's thousands of mostly user-created subreddits per month and saw that a fragmentation of submissions into an ever-increasing amount of distinct subreddits has taken place since reddit’s inception. In the last month of our data set in 2012, 32,202 subreddits received one or more submissions while only 213 did so at the beginning of 2008. The 20 biggest subreddits at the end of 2012 contained less than 40% of all submissions, while they contained around 70% and 80% in mid-2010 and mid-2008, respectively. These findings point to a strong diversification of topics represented by the different subreddits, although many topics and discourses might have existed previously as part of one of the broader themed subreddits, especially r/Reddit.com, which served as the default posting space in the early phase of Reddit. In the figures, we can see r/Reddit.com’s gradual demise, mainly due to more and more user-founded subreddits being introduced.
It can however not be stated with certainty that the general thematic diversity of submissions has in fact increased. Given the high number of similarly themed subreddits, some topics might have just been outsourced from more general subreddits to sharpen their profile. What can be affirmed, however, is that clearly distinct communities around topics had a chance to form in the secluded spaces of the subreddits, each with their own, clear-cut rules and “submission ethics”. In sum, the subreddit diversification fits sufficiently well with the claim of Reddit representing (the best) content from all over the Web. As the Web's content heterogeneity indubitably grew exponentially since 2008, reddit has seemingly be able to mirror this diversity and build sub-communities around it.
We counted the submissions linking to distinct domains on the web (including self) and observed that both “.self” as well as “Imgur.com” have evolved into being the dominant domains that submissions on reddit link to. At the end of 2012 around 27% of all submissions were linking to Imgur, while only 0% and 7% were linking to it in mid-2010 and mid-2008, respectively. A similar picture can be derived when looking at self submissions which constitute around 30% of all submissions at the end of 2012. Thus, while we could observe a diversification of subreddits, we can see on the other hand that submissions link more and more to just a few domains, mainly self and Imgur.com.
In order to better understand these observations we looked at a more fine-grained representation of submissions by manually classifying the top 100 domains into six categories: self, image, video, text, audio and misc. We can observe that indeed self posts have not always been the favorite kind of submission of redditors as from 2008 to mid-2009 the majority of submissions were linking to external textual content. Over time, the (likewise textual) self submissions exceeded the number of external textual submissions. Congruent with the observations made in regard to Imgur.com we can see that image submissions in general have been growing. By closer looking at where image and self posts get submitted to we can see that while for images r/funny, r/pics and r/AdviceAnimals have become the most popular destinations, self posts nowadays mostly get posted to r/AskReddit. These observations suggest that reddit’s community increasingly reinforces its own user-generated image- and textual content.
Given that online communities usually display a large discrepancy between the amount of users submitting content and users mostly just consuming content (with the latter being the clear majority in reddit as well), we also wanted to make statements about these “lurkers”, to learn how reddit’s community actually perceives the shown evolution of content and whether the attention follows the emerging content. Consequently, we studied the two main mechanisms on reddit that capture attention and perception: (i) votes and (ii) comments. Our results indicate that redditors attitude towards offered content generally has become more positive over time (measure via the average score per submission). More detailed, we found that the score as well as the number of comments get fragmented more and more over different subreddits which is in line with our observations regarding the diversification of posted submissions into more and more distinct subreddits. Hence, redditors seem to diversify their interests over a series of distinct sub-communitites.
Further, similar to the posted content, we also found that clearly users’ attention focuses on just a few domains (again Imgur and self). Self submissions have evolved into being the primary factor of conversations, which is evident in the large number of comments that can be attributed to self submissions. Image (e.g., Imgur) submissions, on the other hand, have evolved into receiving a dominant portion of the total votes on Reddit: up from ~16% at the beginning of 2008 to ~85% at the end of 2012. This finding may reflect the concern of redditors that the platform has evolved into being an image board. In sum, an increase of both image as well as self submissions is accompanied by a surge in attention: by a high number of votes for image submissions and a high number of comments for self submissions. This suggests that different types of submissions lend themselves to different types of community reactions, and that these reactions can - sometimes drastically - change over time.
The reddit community itself has shown great interest in the evolution of the platform, as many discussions on r/theoryofreddit reveal. Randy Olson for instance has looked at the evolution of submissions via subreddits (compare figure above); his results as well show a trend towards subreddit diversification. By systematically studying and comparing the evolution of domains, content types and the perception of submissions via comments and votes our work was able to reveal dynamics well beyond these observations, namely the increasing self-reference and shift of attention.
While the results presented here were already submitted for scientific peer-review, reddits increasing self-reference was analyzed and discussed as well in a r/theoryofreddit post by user blackstar9000, corroborating the observations we made here regarding reddit turning its attention more and more inward.
There is also a handful of other academic research papers regarding reddit the reader might find interesting: Lakkaraju et al. studied how titles, submission times and community choices of image submissions affect the success of the content by investigating resubmitted images on Reddit, showing that good content can speak for itself, although a good title has a positive effect on popularity. Gilbert investigated resubmissions of content to Reddit and compared their eventual voting score, finding that identical links are ignored by the community several times before achieving popularity. Weninger et al. focus on comment threads on Reddit, showing that highest scoring comments are mostly submitted at early stages of the discussion. For the similar platform digg.com, studies comparable to the above have been conducted, some juxtaposing Digg and Reddit in specific aspects (e.g., Lerman).
 http://www.reddit.com/about/, as of Feb. 02th, 2014
 http://www.alexa.com/siteinfo/reddit.com, as of Feb. 02th, 2014
A user survey was posted to the subreddits r/theoryofreddit and r/samplesize from Nov. 24 until Dec. 1, 2013. This particular, limited sampling and the self-selection of respondents must be taken into account when interpreting the results (e.g., users of other subreddits might provide different answers). Our analysis showed, however, no notable difference between the answer patterns of the two subreddits and will be reported in aggregate below. We filtered obvious spam answers from the results, leaving n=969 answers, 66% from r/theoryofreddit and 34% from r/samplesize. Note: some questions were optional and not answered by all users (for optional questions, the number of respondents "n" is given in the respective captions).
If you would like to use this survey data, please write me. (aggregated, no individual answers for privacy reasons)
If you would like to cite the survey, please cite the paper above.
We would like to say thanks to all redditors that participated and those that gave helpful comments regarding the questions. We took the feedback very seriously for the second iteration of our survey that is currently running.
from r/theoryofreddit: 66% from r/samplesize: 34% runtime: Nov. 24 - Dec. 1, 2013
Below, we grouped several questions and their answers together:
Average: 2.19 y
Std.Dev. 1.32 | Median 2
Average: 3.08 h
Std.Dev. 1.85 | Median 3
Std.Dev. 14.41 | Median 7
Average: 6.04 h
Std.Dev. 2.89 | Median 5
Average rank: 1.98 from 10
Std.Dev. 1.69 | Median 1
Average (from discrete scale): 68.8%
Std.Dev. 2.13 | Median 70%
Average (from discrete scale): 5.27
Std.Dev. 1.25 | Median 6