Distributed Information Systems Laboratory LSIR

Social network dataset

ABSTRACT:

As mobile cloud computing facilitates a wide spectrum of smart applications, the need for fusing various types of data available in the cloud grows rapidly. In particular, social and sensor data lie at the core in such applications, but typically processed separately. Here in this work, we explore the potential of fusing social and sensor data related to a location entity in the cloud, presenting a practice--a travel recommendation system that offers the predicted mood information of people on where and when users wish to travel.
The system is built upon a conceptual framework that allows to blend the heterogeneous social and sensor data for integrated analysis, extracting weather-dependent people's mood information from Twitter and meteorological sensor data streams. In order to handle massively streaming data, the system employs various cloud-serving systems, such as Hadoop, HBase, and GSN.
Using this scalable system, we performed heavy ETL as well as filtering jobs, resulting in 12 million tweets over four months. We then derived a rich set of interesting findings through the data fusion, proving that our approach is effective and scalable, which can serve as an important basis in fusing social and sensor data in the cloud.

  • Social sensor fusion data [Download]
  • This dataset consists of:
    twitter.csv - containts tweets related to London collected over a period of three months. CSV-Format (#Timestamp,#Location,#TweetMessage,@twitterUser)
    wunder2.csv - consists of weather related information for London over a period of three months. CSV-Format (#Timestamp,#Location,#Weather-Label)
    Weather.Labels - weather labels considered for this work.
    anew.txt - contains the mood metrics for different words.

    Data license:

    1. This data can be redistributed freely.
    2. This data cannot be modified when redistributed.
    3. The data must always be accompanied by this README file.
    4. Any work using this data must cite the following publication:
    @INPROCEEDINGS{Yerva:2012:Fusion,
         author = {Yerva, Surender Reddy and Jeung, Ho Young and Aberer, Karl},
         title = {Cloud based {S}ocial and {S}ensor {D}ata {F}usion},
         booktitle = {{FUSION}},
         year = {2012},
         affiliation = {EPFL},
         details = {http://infoscience.epfl.ch/record/176770},
         location = {Singapore}
    }
    @INPROCEEDINGS{Yerva:2012:MDM,
         author = {Yerva, Surender Reddy and Saltarin, Jonnahtan, Jeung, Ho Young and Aberer Karl},
         title = {Social and {S}ensor {D}ata {F}usion in the {C}loud},
         year = {2012},
         affiliation = {EPFL},
         details = {http://infoscience.epfl.ch/record/175136},
         location = {Bengaluru, India}
    }