TY - GEN
T1 - Stop Words Are Not “Nothing”
T2 - 11th International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction conference and Behavior Representation in Modeling and Simulation, SBP-BRiMS 2018
AU - Rüsenberg, Fabian
AU - Hampton, Andrew J.
AU - Shalin, Valerie L.
AU - Feufel, Markus A.
N1 - Publisher Copyright:
© 2018, Springer International Publishing AG, part of Springer Nature.
PY - 2018
Y1 - 2018
N2 - Social media research often exploits metrics based on frequency counts, e.g., to determine corpus sentiment. Hampton and Shalin [1] introduced an alternative metric examining the style and structure of social media relative to an Internet language baseline. They demonstrated statistically significant differences in lexical choice from tweets collected in a disaster setting relative to the standard. One explanation of this finding is that the Twitter platform, irrespective of disaster setting, and/or specifics of the English language, is responsible for the observed differences. In this paper, we apply the same metric to German corpora, to compare an event-based (the recent election) with a “nothing” crawl, with respect to the use of German modal particles. German modal particles are often used in spoken language and typically regarded as stop words in text mining. This word class is likely to reflect public engagement because of its properties, such as indicating common ground, or reference to previous utterances (i.e. anaphora) [2, 3]. We demonstrate a positive deviation of most modal particles for all corpora relative to general Internet language, consistent with the view that Twitter constitutes a form of conversation. However, the use of modal particles also generally increased in the three corpora related to the 2017 German election relative to the “nothing” corpus. This indicates topic influence beyond platform affordances and supports an interpretation of the German election data as an engaged, collective narrative response to events. Using commonly eliminated features, our finding supports and extends Hampton and Shalin’s analysis that relied on pre-selected antonyms and suggests an alternative method to frequency counts to identify corpora that differ in public engagement.
AB - Social media research often exploits metrics based on frequency counts, e.g., to determine corpus sentiment. Hampton and Shalin [1] introduced an alternative metric examining the style and structure of social media relative to an Internet language baseline. They demonstrated statistically significant differences in lexical choice from tweets collected in a disaster setting relative to the standard. One explanation of this finding is that the Twitter platform, irrespective of disaster setting, and/or specifics of the English language, is responsible for the observed differences. In this paper, we apply the same metric to German corpora, to compare an event-based (the recent election) with a “nothing” crawl, with respect to the use of German modal particles. German modal particles are often used in spoken language and typically regarded as stop words in text mining. This word class is likely to reflect public engagement because of its properties, such as indicating common ground, or reference to previous utterances (i.e. anaphora) [2, 3]. We demonstrate a positive deviation of most modal particles for all corpora relative to general Internet language, consistent with the view that Twitter constitutes a form of conversation. However, the use of modal particles also generally increased in the three corpora related to the 2017 German election relative to the “nothing” corpus. This indicates topic influence beyond platform affordances and supports an interpretation of the German election data as an engaged, collective narrative response to events. Using commonly eliminated features, our finding supports and extends Hampton and Shalin’s analysis that relied on pre-selected antonyms and suggests an alternative method to frequency counts to identify corpora that differ in public engagement.
KW - Big data
KW - Collective narrative
KW - Common ground
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85049804110&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049804110&partnerID=8YFLogxK
UR - https://corescholar.libraries.wright.edu/psychology/552
U2 - 10.1007/978-3-319-93372-6_11
DO - 10.1007/978-3-319-93372-6_11
M3 - Conference contribution
SN - 9783319933719
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 89
EP - 96
BT - Social, Cultural, and Behavioral Modeling
A2 - Bisgin, Halil
A2 - Thomson, Robert
A2 - Hyder, Ayaz
A2 - Dancy, Christopher
PB - Springer Verlag
Y2 - 10 July 2018 through 13 July 2018
ER -