A Data Mining Approach to Predicting Phylum for Microbial Organisms Using Genome-Wide Sequence Data

Rao M. Kotamarti, Douglas W. Raiford, Michael L. Raymer, Margaret H. Dunham

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Genomic sequencing projects are generating vast stores of data that provide opportunities and challenges in data analysis. Investigations of trends in codon usage have proven to be a rich area of study in this field. There are a number of methods for isolating codon usage bias in microbial organisms, each designed to capture a specific aspect of the bias. We posit that each species has evolved under the influence of a unique set of environmental constraints that has governed the shaping of the organism's codon usage. Analysis of codon usage data should, therefore, provide insights into the selection process at work influencing genomic composition. To this end, we describe the large-scale mining of genome-level data from several codon usage bias isolation techniques to determine whether this information can be used to predict the phylum and class to which each organism belongs. Successful prediction is an indication that the forces molding the codon usage of a given phylum/class are indeed distinctive, and that it would be of use in understanding the evolutionary forces involved. Additionally, it supports using this method to aid in, and validate existing taxonomic classification techniques.

Original languageAmerican English
Title of host publicationProceedings of the 2009 9th IEEE International Conference on Bioinformatics and BioEngineering, BIBE 2009
PublisherIEEE
Pages161-167
Number of pages7
ISBN (Print)9780769536569
DOIs
StatePublished - Jun 21 2009
Event2009 9th IEEE International Conference on Bioinformatics and BioEngineering, BIBE 2009 - Taichung, Taiwan, Province of China
Duration: Jun 22 2009Jun 24 2009

Conference

Conference2009 9th IEEE International Conference on Bioinformatics and BioEngineering, BIBE 2009
Country/TerritoryTaiwan, Province of China
CityTaichung
Period6/22/096/24/09

ASJC Scopus Subject Areas

  • Information Systems
  • Biomedical Engineering
  • Health Informatics

Keywords

  • Data mining
  • Organisms
  • Genomics
  • Bioinformatics
  • Sequences
  • Microorganisms
  • Databases
  • Data analysis
  • Computer science
  • Genetic engineering

Disciplines

  • Bioinformatics
  • Communication Technology and New Media
  • Databases and Information Systems
  • OS and Networks
  • Science and Technology Studies

Cite this