↓ Skip to main content

PLOS

Optimizing Read Mapping to Reference Genomes to Determine Composition and Species Prevalence in Microbial Communities

Overview of attention for article published in PLOS ONE, June 2012
Altmetric Badge

Mentioned by

blogs
1 blog
twitter
11 X users

Citations

dimensions_citation
50 Dimensions

Readers on

mendeley
230 Mendeley
citeulike
10 CiteULike
Title
Optimizing Read Mapping to Reference Genomes to Determine Composition and Species Prevalence in Microbial Communities
Published in
PLOS ONE, June 2012
DOI 10.1371/journal.pone.0036427
Pubmed ID
Authors

John Martin, Sean Sykes, Sarah Young, Karthik Kota, Ravi Sanka, Nihar Sheth, Joshua Orvis, Erica Sodergren, Zhengyuan Wang, George M. Weinstock, Makedonka Mitreva

Abstract

The Human Microbiome Project (HMP) aims to characterize the microbial communities of 18 body sites from healthy individuals. To accomplish this, the HMP generated two types of shotgun data: reference shotgun sequences isolated from different anatomical sites on the human body and shotgun metagenomic sequences from the microbial communities of each site. The alignment strategy for characterizing these metagenomic communities using available reference sequence is important to the success of HMP data analysis. Six next-generation aligners were used to align a community of known composition against a database comprising reference organisms known to be present in that community. All aligners report nearly complete genome coverage (>97%) for strains with over 6X depth of coverage, however they differ in speed, memory requirement and ease of use issues such as database size limitations and supported mapping strategies. The selected aligner was tested across a range of parameters to maximize sensitivity while maintaining a low false positive rate. We found that constraining alignment length had more impact on sensitivity than does constraining similarity in all cases tested. However, when reference species were replaced with phylogenetic neighbors, similarity begins to play a larger role in detection. We also show that choosing the top hit randomly when multiple, equally strong mappings are available increases overall sensitivity at the expense of taxonomic resolution. The results of this study identified a strategy that was used to map over 3 tera-bases of microbial sequence against a database of more than 5,000 reference genomes in just over a month.

X Demographics

X Demographics

The data shown below were collected from the profiles of 11 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 230 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 22 10%
France 3 1%
Italy 2 <1%
Denmark 2 <1%
Belgium 2 <1%
India 2 <1%
Canada 2 <1%
United Kingdom 1 <1%
Netherlands 1 <1%
Other 5 2%
Unknown 188 82%

Demographic breakdown

Readers by professional status Count As %
Researcher 74 32%
Student > Ph. D. Student 51 22%
Student > Master 27 12%
Student > Bachelor 13 6%
Other 13 6%
Other 42 18%
Unknown 10 4%
Readers by discipline Count As %
Agricultural and Biological Sciences 150 65%
Biochemistry, Genetics and Molecular Biology 18 8%
Medicine and Dentistry 12 5%
Computer Science 9 4%
Immunology and Microbiology 8 3%
Other 19 8%
Unknown 14 6%