Harnessing Twitter "Big Data" for Automatic Emotion Identification

Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

User generated content on Twitter (produced at an enormous rate of 340 million tweets per day) provides a rich source for gleaning people's emotions, which is necessary for deeper understanding of people's behaviors and actions. Extant studies on emotion identification lack comprehensive coverage of "emotional situations" because they use relatively small training datasets. To overcome this bottleneck, we have automatically created a large emotion-labeled dataset (of about 2.5 million tweets) by harnessing emotion-related hash tags available in the tweets. We have applied two different machine learning algorithms for emotion identification, to study the effectiveness of various feature combinations as well as the effect of the size of the training data on the emotion identification task. Our experiments demonstrate that a combination of unigrams, big rams, sentiment/emotion-bearing words, and parts-of-speech information is most effective for gleaning emotions. The highest accuracy (65.57%) is achieved with a training data containing about 2 million tweets.

Original languageEnglish
Title of host publication2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing
PublisherIEEE
Pages587-592
Number of pages6
ISBN (Electronic)978-0-7695-4848-7
ISBN (Print)978-1-4673-5638-1
DOIs
StatePublished - 2012
Event2012 ASE/IEEE International Conference on Social Computing, SocialCom 2012 and the 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust, PASSAT 2012 - Amsterdam, Netherlands
Duration: Sep 3 2012Sep 5 2012

Conference

Conference2012 ASE/IEEE International Conference on Social Computing, SocialCom 2012 and the 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust, PASSAT 2012
Country/TerritoryNetherlands
CityAmsterdam
Period9/3/129/5/12

ASJC Scopus Subject Areas

  • Safety, Risk, Reliability and Quality

Keywords

  • Emotion Analysis
  • Emotion Identification
  • Emotion Intelligence
  • Twitter

Disciplines

  • Bioinformatics
  • Communication Technology and New Media
  • Databases and Information Systems
  • OS and Networks
  • Science and Technology Studies

Cite this