↓ Skip to main content

PLOS

Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines

Overview of attention for article published in PLOS ONE, November 2012
Altmetric Badge

Mentioned by

news
1 news outlet
twitter
20 X users

Citations

dimensions_citation
52 Dimensions

Readers on

mendeley
161 Mendeley
citeulike
6 CiteULike
Title
Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
Published in
PLOS ONE, November 2012
DOI 10.1371/journal.pone.0049110
Pubmed ID
Authors

Matthew Frampton, Richard Houlston

Abstract

Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/.

X Demographics

X Demographics

The data shown below were collected from the profiles of 20 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 161 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 7 4%
United Kingdom 2 1%
Spain 2 1%
Switzerland 1 <1%
France 1 <1%
India 1 <1%
Germany 1 <1%
Colombia 1 <1%
Italy 1 <1%
Other 0 0%
Unknown 144 89%

Demographic breakdown

Readers by professional status Count As %
Researcher 48 30%
Student > Ph. D. Student 31 19%
Student > Master 24 15%
Other 10 6%
Student > Bachelor 9 6%
Other 27 17%
Unknown 12 7%
Readers by discipline Count As %
Agricultural and Biological Sciences 74 46%
Biochemistry, Genetics and Molecular Biology 35 22%
Computer Science 13 8%
Medicine and Dentistry 7 4%
Immunology and Microbiology 6 4%
Other 11 7%
Unknown 15 9%