Selecting Labels for News Document Clusters

Krishnaprasad Thirunarayan, Trivikram Immaneni, Mastan Vali Shaik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This work deals with determination of meaningful and terse cluster labels for News document clusters. We analyze a number of alternatives for selecting headlines and/or sentences of document in a document cluster (obtained as a result of an entity-event-duration query), and formalize an approach to extracting a short phrase from well-supported headlines/sentences of the cluster that can serve as the cluster label. Our technique maps a sentence into a set of significant stems to approximate its semantics, for comparison. Eventually a cluster label is extracted from a selected headline/sentence as a contiguous sequence of words, resuscitating word sequencing information lost in the formalization of semantic equivalence.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007, Proceedings
PublisherSpringer Verlag
Pages119-130
Number of pages12
ISBN (Print)3540733507, 9783540733508
DOIs
StatePublished - 2007
Externally publishedYes
Event12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007 - Paris, France
Duration: Jun 27 2007Jun 29 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4592 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th International Conference on Applications of Natural Language to Information Systems, NLDB 2007
Country/TerritoryFrance
CityParis
Period6/27/076/29/07

ASJC Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

Disciplines

  • Bioinformatics
  • Communication
  • Communication Technology and New Media
  • Computer Sciences
  • Databases and Information Systems
  • Life Sciences
  • OS and Networks
  • Physical Sciences and Mathematics
  • Science and Technology Studies
  • Social and Behavioral Sciences

Cite this