Distributed Information Systems Laboratory LSIR

Social network dataset

ABSTRACT:

Microblogging sites are a unique and dynamic Web 2.0 communication medium. Understanding the information flow in these systems can not only provide better insights into the underlying sociology, but is also crucial for applications such as content ranking, recommendation and filtering, spam detection and viral marketing. In this paper, we characterize the propagation of URLs in the social network of Twitter, a popular microblogging site. We track 15 million URLs exchanged among 2.7 million users over a 300 hour period. Data analysis uncovers several statistical regularities in the user activity, the social graph, the structure of the URL cascades and the communication dynamics. Based on these results we propose a propagation model that predicts which users are likely to mention which URLs. The model correctly accounts for more than half of the URL mentions in our data set, while maintaining a false positive rate lower than 15%.

  • Tweets [Download]
  • contains most of the tweets containing URLs that were tweeted between < insert start timestamp > and < insert end timestamp >. Each tweet metadata is formated as JSON, one per line.

  • User Graph [Download]
  • contains the Twitter follower graph as of the time when the tweets from < tweets filename > were downloaded. The graph includes only the users that have mentioned at least one URL, i.e. the users that appear in < tweets filename > as the authors of the tweets.

    Data license:

    1. This data can be redistributed freely.
    2. This data cannot be modified when redistributed.
    3. The data must always be accompanied by this README file.
    4. Any work using this data must cite the following publication:
    @InProceedings{galuba-wosn10,
         author = {Galuba, Wojciech and Aberer, Karl and Chakraborty, Dipanjan and Despotovic, Zoran and Kellerer, Wolfgang},
         booktitle = {3rd {W}orkshop on {O}nline {S}ocial {N}etworks ({WOSN}'10)},
         title = {{O}uttweeting the {T}witterers - {P}redicting {I}nformation {C}ascades in {M}icroblogs},
         year = 2010
    }