A brief description of select ongoing projects in service analytics is provided below. If you need further information, please contact us.
Mining Twitter data: From content to connections
Microblogging has quickly grown as the avatar of social interaction. Though many websites like FriendFeed, Dailybooth, and Tumblr support microblogging, Twitter is the most favored microblogging platform. With 500 million registered users, more than 400 million tweets are posted every day. Twitter's ability to propagate real-time information to a wide set of users makes it a potential system for disseminating vital information.
About Our Twitter Database and infrastructure at WSU: We collect streaming data from the Twitter's firehose API. This gives us about 10 percent of the entire Twitter data. We obtain about 5GB of data and about 19 million tweets each day. Since our data is extremely "big and growing," we have established a complete distributed database that can perform parallel queries through the API. Our setup is designed to greatly minimize the query time for big data processing. We have designed our system to retrieve and analyze a wide array of information from the Twitter data such as retweet network, follower and friends network, Twitter lists, geo-location based statistics, topic modeling on tweets, etc.
Recent Work: Location-specific tweet detection and topic summarization in Twitter: We developed a novel framework to identify and summarize tweets that are specific to a particular geo-graphical location. Our new weighting scheme called Location Centric Word Co-occurrence (LCWC) uses the content of the tweets and the network information of the "twitterers" to identify tweets that are location-specific. Using our approach, the topics that are specific to a particular location of interest are summarized and presented to the end-users. In our analysis, we found that (a) top trending tweets from a location are poor descriptors of location-specific tweets, (b) ranking the tweets based on users' geo-location cannot ascertain the location specificity of the tweets, and (c) the users' network information plays an important role in determining the location-specific characteristics of the tweets.
Recommendations in Twitter
The uproar of tweet volume has resulted in a major problem of information overload. Some of the fundamental challenges due to this overload are: (i) Difficulty in predicting users' interest, retweet behavior, temporal evolution of topics, and other network-based or content-based prediction, (ii) Difficulty in recommending tweets, friends/followers, auxiliary information such as urls, news articles, blogs, etc. Our current work tackles the information overload problem by developing a Tweet recommender system that recommends auxiliary tweets to the users by modeling their temporally varying topical interests.