A Modular Approach to Document Indexing and Semantic Search

Dhanya Ravishankar, Krishnaprasad Thirunarayan, Trivikram Immaneni

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper develops a modular approach to improving effectiveness of searching documents for information by reusing and integrating mature software components such as Lucene APIs, WORDNET, LSA techniques, and domain-specific controlled vocabulary. To evaluate the practical benefits, the prototype was used to query MEDLINE database, and to locate domain-specific controlled vocabulary terms in Materials and Process Specifications. Its extensibility has been demonstrated by incorporating a spell-checker for the input query, and by structuring the retrieved output into hierarchical collections for quicker assimilation. It is also being used to experimentally explore the relationship between LSA and document clustering using 20-mini-newsgroups and Reuters data. In future, this prototype will be used as experimental testbed for expressive, context-aware and scalable searches.
Original languageEnglish
Title of host publicationProceedings of the IASTED International Conference on Web Technologies, Applications, and Services, WTAS 2005
EditorsM.H. Hamza
PublisherACTA Press
Pages165-170
Number of pages6
ISBN (Print)0889864853, 9780889864856
StatePublished - 2005
EventInternational Conference on Web Technologies, Applications, and Services, WTAS 2005 - Calgary, AB, Canada
Duration: Jul 4 2005Jul 6 2005

Conference

ConferenceInternational Conference on Web Technologies, Applications, and Services, WTAS 2005
Country/TerritoryCanada
CityCalgary, AB
Period7/4/057/6/05

ASJC Scopus Subject Areas

  • General Engineering

Keywords

  • Document Clustering
  • Domain-Specific Search
  • Latent Semantic Indexing
  • Modular Search Engine
  • Search and Querying
  • Tools

Disciplines

  • Databases and Information Systems
  • Cataloging and Metadata

Cite this