A semantics-based measure of emoji similarity

Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Emoji have grown to become one of the most important forms of communication on the web. With its widespread use, measuring the similarity of emoji has become an important problem for contemporary text processing since it lies at the heart of sentiment analysis, search, and interface design tasks. This paper presents a comprehensive analysis of the semantic similarity of emoji through embedding models that are learned over machine-readable emoji meanings in the EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, we develop and test multiple embedding models to measure emoji similarity. To evaluate our work, we create a new dataset called EmoSim508, which assigns human-annotated semantic similarity scores to a set of 508 carefully selected emoji pairs. After validation with EmoSim508, we present a real-world use-case of our emoji embedding models using a sentiment analysis task and show that our models outperform the previous best-performing emoji embedding model on this task. The EmoSim508 dataset and our emoji embedding models are publicly released with this paper and can be downloaded from http://emojinet.knoesis.org/ .

Original languageEnglish
Title of host publicationWI '17: Proceedings of the International Conference on Web Intelligence
PublisherAssociation for Computing Machinery, Inc
Pages646-653
Number of pages8
ISBN (Electronic)9781450349512
DOIs
StatePublished - Aug 23 2017
Event16th IEEE/WIC/ACM International Conference on Web Intelligence - Leipzig, Germany
Duration: Aug 23 2017Aug 26 2017
Conference number: 16

Conference

Conference16th IEEE/WIC/ACM International Conference on Web Intelligence
Abbreviated titleWI 2017
Country/TerritoryGermany
CityLeipzig
Period8/23/178/26/17

ASJC Scopus Subject Areas

  • Computer Networks and Communications
  • Artificial Intelligence
  • Software

Keywords

  • Emoji analysis and search
  • Emoji similarity
  • Semantic similarity

Disciplines

  • Bioinformatics
  • Communication
  • Communication Technology and New Media
  • Computer Sciences
  • Databases and Information Systems
  • Life Sciences
  • OS and Networks
  • Physical Sciences and Mathematics
  • Science and Technology Studies
  • Social and Behavioral Sciences

Cite this