↓ Skip to main content

PLOS

Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

Overview of attention for article published in PLOS ONE, February 2012
Altmetric Badge

Mentioned by

blogs
1 blog
twitter
20 X users
wikipedia
7 Wikipedia pages

Citations

dimensions_citation
228 Dimensions

Readers on

mendeley
502 Mendeley
citeulike
10 CiteULike
Title
Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data
Published in
PLOS ONE, February 2012
DOI 10.1371/journal.pone.0031386
Pubmed ID
Authors

Daniel R. Mende, Alison S. Waller, Shinichi Sunagawa, Aino I. Järvelin, Michelle M. Chan, Manimozhiyan Arumugam, Jeroen Raes, Peer Bork

Abstract

Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available.

X Demographics

X Demographics

The data shown below were collected from the profiles of 20 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 502 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 17 3%
Brazil 6 1%
Spain 6 1%
Germany 4 <1%
France 4 <1%
United Kingdom 4 <1%
Sweden 3 <1%
Belgium 3 <1%
Portugal 3 <1%
Other 19 4%
Unknown 433 86%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 125 25%
Researcher 121 24%
Student > Master 70 14%
Student > Bachelor 37 7%
Student > Doctoral Student 26 5%
Other 88 18%
Unknown 35 7%
Readers by discipline Count As %
Agricultural and Biological Sciences 289 58%
Biochemistry, Genetics and Molecular Biology 62 12%
Computer Science 31 6%
Environmental Science 17 3%
Immunology and Microbiology 10 2%
Other 50 10%
Unknown 43 9%