f ²(web)

reddit research and data.

reddit evolution paper.

research paper about reddit's growth and change 2008-2012. titled "evolution of reddit: from the front page of the internet to a self-referential community?"

reddit user survey.

a reddit survey that was conducted in november 2013 in r/theoryofreddit and r/samplesize. results partly referenced in the reddit evolution paper.

reddit evolution paper.

The following results are part of a research paper at the Web Science track of the World Wide Web conference 2014 in Seoul, South Korea:

Philipp Singer, Fabian Flöck, Clemens Meinhart, Elias Zeitfogel and Markus Strohmaier | (pdf download)

"Evolution of Reddit: From the Front Page of the Internet to a Self-referential Community?"

For this study we examined the evolution of the link-sharing community reddit.com from 2008 until 2012 by analyzing all submissions during that time period (close to 60 million submissions). In particular, we looked at (a) how user submissions to reddit have evolved over time and (b) how redditors’ attention and their perception of submissions have changed.

why research reddit?

Since its founding in 2005, reddit has grown into one of the largest online communities on the web. Today, reddit.com has more than 112 million unique visitors from over 195 countries each month.¹ It is ranked by alexa.com as the 69th and 27st most popular website in the world and the U.S., respectively.²

data used.

We analyzed all submissions from Jan. 2008 to Dec. 2012, crawled through reddit’s API. For each submission, the timestamp, title, author, up- and downvotes, number of comments, and the link or text it contained were collected around 1-2 months after the initial submission. Overall, we analyzed 58,874,22 submissions in 125,662 distinct subreddits from 4,910,850 different authors linking to 1,841,239 distinct domains on the Web. Hence, to the best of our knowledge, this work represents the most comprehensive longitudinal study of Reddit's evolution to date studying both how (i) user submissions and (ii) the community's allocation of attention have evolved over time.

main findings.

Our data analysis reveals two main findings:
(i) a strong, ongoing diversification of topics (i.e., subreddits) coupled with
(ii) a simultaneous concentration towards a few selected domains and types of submissions (self and images) that seem to feature mostly user-generated content.
This suggests that reddit has evolved from the “front page of the internet” (i.e., a gateway to external websites) to an increasingly diverse but self-referential community that focuses on and progressively reinforces its own user-generated content over external sources. We also suspect that posts by users increasingly use references from the “reddit culture”, such as memes or inside jokes. As a bottom line, reddit seems to be increasingly more occupied with itself than with the outside web.

the diversification of subreddits.

We counted the submissions to all of reddit's thousands of mostly user-created subreddits per month and saw that a fragmentation of submissions into an ever-increasing amount of distinct subreddits has taken place since reddit’s inception. In the last month of our data set in 2012, 32,202 subreddits received one or more submissions while only 213 did so at the beginning of 2008. The 20 biggest subreddits at the end of 2012 contained less than 40% of all submissions, while they contained around 70% and 80% in mid-2010 and mid-2008, respectively. These findings point to a strong diversification of topics represented by the different subreddits, although many topics and discourses might have existed previously as part of one of the broader themed subreddits, especially r/Reddit.com, which served as the default posting space in the early phase of Reddit. In the figures, we can see r/Reddit.com’s gradual demise, mainly due to more and more user-founded subreddits being introduced.

It can however not be stated with certainty that the general thematic diversity of submissions has in fact increased. Given the high number of similarly themed subreddits, some topics might have just been outsourced from more general subreddits to sharpen their profile. What can be affirmed, however, is that clearly distinct communities around topics had a chance to form in the secluded spaces of the subreddits, each with their own, clear-cut rules and “submission ethics”. In sum, the subreddit diversification fits sufficiently well with the claim of Reddit representing (the best) content from all over the Web. As the Web's content heterogeneity indubitably grew exponentially since 2008, reddit has seemingly be able to mirror this diversity and build sub-communities around it.

concentration on certain domains, ".self" and images.

We counted the submissions linking to distinct domains on the web (including self) and observed that both “.self” as well as “Imgur.com” have evolved into being the dominant domains that submissions on reddit link to. At the end of 2012 around 27% of all submissions were linking to Imgur, while only 0% and 7% were linking to it in mid-2010 and mid-2008, respectively. A similar picture can be derived when looking at self submissions which constitute around 30% of all submissions at the end of 2012. Thus, while we could observe a diversification of subreddits, we can see on the other hand that submissions link more and more to just a few domains, mainly self and Imgur.com.

In order to better understand these observations we looked at a more fine-grained representation of submissions by manually classifying the top 100 domains into six categories: self, image, video, text, audio and misc. We can observe that indeed self posts have not always been the favorite kind of submission of redditors as from 2008 to mid-2009 the majority of submissions were linking to external textual content. Over time, the (likewise textual) self submissions exceeded the number of external textual submissions. Congruent with the observations made in regard to Imgur.com we can see that image submissions in general have been growing. By closer looking at where image and self posts get submitted to we can see that while for images r/funny, r/pics and r/AdviceAnimals have become the most popular destinations, self posts nowadays mostly get posted to r/AskReddit. These observations suggest that reddit’s community increasingly reinforces its own user-generated image- and textual content.

attention and perception.

Given that online communities usually display a large discrepancy between the amount of users submitting content and users mostly just consuming content (with the latter being the clear majority in reddit as well), we also wanted to make statements about these “lurkers”, to learn how reddit’s community actually perceives the shown evolution of content and whether the attention follows the emerging content. Consequently, we studied the two main mechanisms on reddit that capture attention and perception: (i) votes and (ii) comments. Our results indicate that redditors attitude towards offered content generally has become more positive over time (measure via the average score per submission). More detailed, we found that the score as well as the number of comments get fragmented more and more over different subreddits which is in line with our observations regarding the diversification of posted submissions into more and more distinct subreddits. Hence, redditors seem to diversify their interests over a series of distinct sub-communitites.

Further, similar to the posted content, we also found that clearly users’ attention focuses on just a few domains (again Imgur and self). Self submissions have evolved into being the primary factor of conversations, which is evident in the large number of comments that can be attributed to self submissions. Image (e.g., Imgur) submissions, on the other hand, have evolved into receiving a dominant portion of the total votes on Reddit: up from ~16% at the beginning of 2008 to ~85% at the end of 2012. This finding may reflect the concern of redditors that the platform has evolved into being an image board. In sum, an increase of both image as well as self submissions is accompanied by a surge in attention: by a high number of votes for image submissions and a high number of comments for self submissions. This suggests that different types of submissions lend themselves to different types of community reactions, and that these reactions can - sometimes drastically - change over time.

community discussions/analysis and related research.

The reddit community itself has shown great interest in the evolution of the platform, as many discussions on r/theoryofreddit reveal. Randy Olson for instance has looked at the evolution of submissions via subreddits (compare figure above); his results as well show a trend towards subreddit diversification. By systematically studying and comparing the evolution of domains, content types and the perception of submissions via comments and votes our work was able to reveal dynamics well beyond these observations, namely the increasing self-reference and shift of attention.

While the results presented here were already submitted for scientific peer-review, reddits increasing self-reference was analyzed and discussed as well in a r/theoryofreddit post by user blackstar9000, corroborating the observations we made here regarding reddit turning its attention more and more inward.

There is also a handful of other academic research papers regarding reddit the reader might find interesting:
Lakkaraju et al. studied how titles, submission times and community choices of image submissions affect the success of the content by investigating resubmitted images on Reddit, showing that good content can speak for itself, although a good title has a positive effect on popularity. Gilbert investigated resubmissions of content to Reddit and compared their eventual voting score, finding that identical links are ignored by the community several times before achieving popularity. Weninger et al. focus on comment threads on Reddit, showing that highest scoring comments are mostly submitted at early stages of the discussion. For the similar platform digg.com, studies comparable to the above have been conducted, some juxtaposing Digg and Reddit in specific aspects (e.g., Lerman).

number of submissions over time.

reddit's growth has been exponential since 2008 and shows no signs of slowing down in our data. The exponential growth model was tested as the best fit.

evolution of submissions per month over subreddits.

All active subreddits are depicted, with their relative size in percent compared to the overall size in total submissions on reddit at a specific time (2008-12/2012). A fragmentation of submissions into an ever-increasing amount of distinct subreddits has taken place since reddit’s inception. (20 largest subreddits in distinct colors; rest combined in brown.)

evolution of submissions per month over domains.

Similar to the figure above the relative proportion of submissions linking to specific domains is visualized over time. A concentration towards only a few domains over time with a high focus on imgur and self submissions becomes apparent.

evolution of submissions per month over types of content.

By manually classifying the top 100 domains on reddit into content categories a general shift towards self and image submissions over time is emerging. (same data as graph above, with domains classified into content categories)

evolution of submissions per month over subreddits for image posts (left) and self posts (right).

A closer look at where image and self posts get submitted to. While for images r/funny, r/pics and r/AdviceAnimals have become the most popular destinations, self posts nowadays mostly get posted to r/AskReddit.

evolution of score, number of comments and number of votes of submissions per month over domains.

Similar investigations as in the figure before, but this time by looking at linked to domains. We can see that users' attention focuses on just a few domains (self, Imgur) over time when looking at the number of comments. Self submissions seem to be the driving factor of conversations. To the contrary, Imgur submissions seem to have evolved into capturing a majority of the complete score and number of votes on Reddit.

evolution of score, number of comments and number of votes of submissions per month over types of content.

By digging deeper into the type of content of submissions (by using our manual classification) we can clearly see that image submissions receive a dominant portion of the total votes while self submissions are in slight decline. To the contrary, as seen before, self submissions now have evolved into capturing most of the discussion in the form of comments on Reddit.

[1] http://www.reddit.com/about/, as of Feb. 02th, 2014

[2] http://www.alexa.com/siteinfo/reddit.com, as of Feb. 02th, 2014

reddit user survey.

A user survey was posted to the subreddits r/theoryofreddit and r/samplesize from Nov. 24 until Dec. 1, 2013. This particular, limited sampling and the self-selection of respondents must be taken into account when interpreting the results (e.g., users of other subreddits might provide different answers). Our analysis showed, however, no notable difference between the answer patterns of the two subreddits and will be reported in aggregate below. We filtered obvious spam answers from the results, leaving n=969 answers, 66% from r/theoryofreddit and 34% from r/samplesize. Note: some questions were optional and not answered by all users (for optional questions, the number of respondents "n" is given in the respective captions).

If you would like to use this survey data, please write me. (aggregated, no individual answers for privacy reasons)

If you would like to cite the survey, please cite the paper above.

We would like to say thanks to all redditors that participated and those that gave helpful comments regarding the questions. We took the feedback very seriously for the second iteration of our survey that is currently running.

respondents: 969

from r/theoryofreddit: 66%
from r/samplesize: 34%
runtime: Nov. 24 - Dec. 1, 2013

Below, we grouped several questions and their answers together:

Hover over the charts to see tooltips with additional information.
Hit the "Change view" button to select another chart type (experimental).
If the charts appear as static images, try logging into your google account to fix it.
respondent demographics.

Gender of respondents

[single choice, optional: n=669]

Region where respondents live

[single choice, optional: n=665]

Age groups of respondents

[single choice, optional: n=675]

How long have you been active on reddit?

[numeric field, in years, optional: n=644]

Average: 2.19 y

Std.Dev. 1.32 | Median 2

frequency of visit. | popularity.

How much time do you spend on reddit on average per day?

[numeric field, in hours]

Average: 3.08 h

Std.Dev. 1.85 | Median 3

How many different websites do you usually visit daily?

[numeric field]

Average: 11.81

Std.Dev. 14.41 | Median 7

How much time do you spend on the internet (total, including reddit) on average per day?

[numeric field, in hours]

Average: 6.04 h

Std.Dev. 2.89 | Median 5

Where is reddit ranked among the top sites you usually visit daily?

[single choice, scale 1-10, excl. 0.6% "not in top 10"]

Average rank: 1.98 from 10

Std.Dev. 1.69 | Median 1

defining reddit. | types of usage.

How would you characterize reddit in terms of some common descriptions used for websites?

[multiple choice]

From the time you procrastinate per day, how much of that procrastination is done via reddit?

[single choice, 11-point scale, daily average]

Average (from discrete scale): 68.8%

Std.Dev. 2.13 | Median 70%

Is reddit the main website through which you access a specific type of content on the web?

[multiple choice]

When you use reddit, how would you describe your surfing behaviour? (Specific goal vs. exploring)

[single choice, scale 1-7, only extremes labeled]

Average (from discrete scale): 5.27

Std.Dev. 1.25 | Median 6

question. "How do you divide your activities among
the different topics/aspects of reddit?"

1. Posting content

[single choice, optional: n=669]

3. Commenting + discussing

[single choice, optional: n=665]

5. Reading or watching news

[single choice, optional: n=669]

7. Reading or watching useful or educating content (manuals, DIY ,advice, etc.)

[single choice, optional: n=669]

2. Up- and downvoting

[single choice, optional: n=670]

4. 1-to-1 interaction with other users

[single choice, optional: n=671]

6. Reading or watching entertainment content

[single choice, optional: n=667]

8. Other

[single choice, optional: n=513]