TY - JOUR
T1 - Analyzing and Learning the Language for Different Types of Harassment
AU - Rezvan, Mohammadreza
AU - Shekarpour, Saeedeh
AU - Alshargi, Faisal
AU - Thirunarayan, Krishnaprasad
AU - Shalin, Valerie L.
AU - Sheth, Amit
PY - 2020/3/27
Y1 - 2020/3/27
N2 - THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires the identification of different types of harassment. Earlier work has classified harassing language in terms of hurtfulness, abusiveness, sentiment, and profanity. However, to identify and understand harassment more accurately, it is essential to determine the contextual type that captures the interrelated conditions in which harassing language occurs. In this paper we introduce the notion of contextual type in harassment by distinguishing between five contextual types: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. We utilize an annotated corpus from Twitter distinguishing these types of harassment. We study the context of each kind to shed light on the linguistic meaning, interpretation, and distribution, with results from two lines of investigation: an extensive linguistic analysis, and the statistical distribution of uni-grams. We then build type- aware classifiers to automate the identification of type-specific harassment. Our experiments demonstrate that these classifiers provide competitive accuracy for identifying and analyzing harassment on social media. We present extensive discussion and significant observations about the effectiveness of type-aware classifiers using a detailed comparison setup, providing insight into the role of type-dependent features.
AB - THIS ARTICLE USES WORDS OR LANGUAGE THAT IS CONSIDERED PROFANE, VULGAR, OR OFFENSIVE BY SOME READERS. The presence of a significant amount of harassment in user-generated content and its negative impact calls for robust automatic detection approaches. This requires the identification of different types of harassment. Earlier work has classified harassing language in terms of hurtfulness, abusiveness, sentiment, and profanity. However, to identify and understand harassment more accurately, it is essential to determine the contextual type that captures the interrelated conditions in which harassing language occurs. In this paper we introduce the notion of contextual type in harassment by distinguishing between five contextual types: (i) sexual, (ii) racial, (iii) appearance-related, (iv) intellectual and (v) political. We utilize an annotated corpus from Twitter distinguishing these types of harassment. We study the context of each kind to shed light on the linguistic meaning, interpretation, and distribution, with results from two lines of investigation: an extensive linguistic analysis, and the statistical distribution of uni-grams. We then build type- aware classifiers to automate the identification of type-specific harassment. Our experiments demonstrate that these classifiers provide competitive accuracy for identifying and analyzing harassment on social media. We present extensive discussion and significant observations about the effectiveness of type-aware classifiers using a detailed comparison setup, providing insight into the role of type-dependent features.
KW - Twitter
KW - Language
KW - Semantics
KW - Social media
KW - Lexicons
KW - Emotions
KW - Support vecotr machines
KW - Facebook
UR - https://corescholar.libraries.wright.edu/cse/551
UR - https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0227330
U2 - 10.1371/journal.pone.0227330
DO - 10.1371/journal.pone.0227330
M3 - Article
C2 - 32218569
VL - 15
JO - PLoS One
JF - PLoS One
IS - 3
M1 - e0227330
ER -