U.S. Department of Energy

Pacific Northwest National Laboratory

Social Media Analysis

From DIC

Jump to: navigation, search

[edit] Reducing complex social graphs fo forms that people can readily understand and use

Ultimately, the purpose is to leverage human expertise through a time-saving tool that identifies key authors, topics and posts in online social media. We have demonstrated that our methods can survey posts in one or more online communities to find those agents that act as information sources or the drive the discussion, direction, and opinion within a community. Analysts can then evaluate topics and posts by the identified agents to validate or refute responses to questions of interest for commercial and government entities. Our current methods are applicable to both English and foreign-language social media sources.


Our hypothesis is that complex data from online communities (known as weblogs, blogs, or microblogs) can be condensed into descriptions that - despite their terseness - still provide useful insight into the nature of the online communities.
The PNNL team has leveraged the observation that posts in online social communities follow a power law distribution, which implies that a few authors or information sources have a disproportionately large influence on an online community. Our key insight that a few of these high agency information sources drive (and potentially control) discourse and the information conveyed within the online community. Our methods separate out these key agents from those that participant in “flame wars” where responses tend to be of a personal nature, “echo chambers” where comments tend to reaffirm a commonly held opinions or positions, and information relays that merely repeat information provided by others.


The PNNL team includes interdisciplinary experts with core competencies in: Blog and graph analysis Massively-parallel algorithms and CRAY XMT technology Extensive experience delivering technology into the hands of working analysts
Scraping social media from a variety of sources (blogs, microblogs, Twitter, etc.), we target online communities and posts through keyword searches. We then generate graphs where each node is an author and every edge represents either a post or comment. Agents of interest (e.g thought-leaders or information-leaders) are identified by triangulating across to several metrics based on centrality of the nodes. High-ranking low-frequency nodes (as implied by the power law) can be separated from huge amounts of data and presented to the analyst for further analysis.
Validation is an essential part of our effort. The PNNL team validates through expert knowledge by examining the top ranked authors to determine if they are noted individuals in their fields or that their posts are informative and appear to reflect and/or drive the opinions of many others on the blogs. Current datasets include Twitter and blog data on climate change, H1N1, and recent flooding in the Atlanta area.


Website: http://cass-mt.pnl.gov/research/default.aspx

Article Title: Social networking

Article Added: 2010/08/16

Category(s): Intelligence Analysis



Last Update: 13 July 2011 | Pacific Northwest National Laboratory