Abstract
This paper develops a modular approach to improving effectiveness of searching documents for information by reusing and integrating mature software components such as Lucene APIs, WORDNET, LSA techniques, and domain-specific controlled vocabulary. To evaluate the practical benefits, the prototype was used to query MEDLINE database, and to locate domain-specific controlled vocabulary terms in Materials and Process Specifications. Its extensibility has been demonstrated by incorporating a spell-checker for the input query, and by structuring the retrieved output into hierarchical collections for quicker assimilation. It is also being used to experimentally explore the relationship between LSA and document clustering using 20-mini-newsgroups and Reuters data. In future, this prototype will be used as experimental testbed for expressive, context-aware and scalable searches.
Original language | English |
---|---|
Title of host publication | Proceedings of the IASTED International Conference on Web Technologies, Applications, and Services, WTAS 2005 |
Editors | M.H. Hamza |
Publisher | ACTA Press |
Pages | 165-170 |
Number of pages | 6 |
ISBN (Print) | 0889864853, 9780889864856 |
State | Published - 2005 |
Event | International Conference on Web Technologies, Applications, and Services, WTAS 2005 - Calgary, AB, Canada Duration: Jul 4 2005 → Jul 6 2005 |
Conference
Conference | International Conference on Web Technologies, Applications, and Services, WTAS 2005 |
---|---|
Country/Territory | Canada |
City | Calgary, AB |
Period | 7/4/05 → 7/6/05 |
ASJC Scopus Subject Areas
- General Engineering
Keywords
- Document Clustering
- Domain-Specific Search
- Latent Semantic Indexing
- Modular Search Engine
- Search and Querying
- Tools
Disciplines
- Databases and Information Systems
- Cataloging and Metadata