TY - GEN
T1 - Selecting Labels for News Document Clusters
AU - Thirunarayan, Krishnaprasad
AU - Immaneni, Trivikram
AU - Shaik, Mastan Vali
PY - 2007
Y1 - 2007
N2 - This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.
AB - This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.
UR - http://www.scopus.com/inward/record.url?scp=38148999682&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38148999682&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-73351-5_11
DO - 10.1007/978-3-540-73351-5_11
M3 - Conference contribution
SN - 3540733507
SN - 9783540733508
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 119
EP - 130
BT - Natural Language Processing and Information Systems - 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, Proceedings
PB - Springer Verlag
T2 - 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007
Y2 - 27 June 2007 through 29 June 2007
ER -