Title |
Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
|
---|---|
Published in |
PLOS ONE, November 2012
|
DOI | 10.1371/journal.pone.0049110 |
Pubmed ID | |
Authors |
Matthew Frampton, Richard Houlston |
Abstract |
Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 10% |
France | 2 | 10% |
Germany | 2 | 10% |
Norway | 1 | 5% |
Peru | 1 | 5% |
Australia | 1 | 5% |
Canada | 1 | 5% |
United Kingdom | 1 | 5% |
Montenegro | 1 | 5% |
Other | 1 | 5% |
Unknown | 7 | 35% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 12 | 60% |
Scientists | 7 | 35% |
Science communicators (journalists, bloggers, editors) | 1 | 5% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 7 | 4% |
United Kingdom | 2 | 1% |
Spain | 2 | 1% |
Switzerland | 1 | <1% |
France | 1 | <1% |
India | 1 | <1% |
Germany | 1 | <1% |
Colombia | 1 | <1% |
Italy | 1 | <1% |
Other | 0 | 0% |
Unknown | 144 | 89% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 48 | 30% |
Student > Ph. D. Student | 31 | 19% |
Student > Master | 24 | 15% |
Other | 10 | 6% |
Student > Bachelor | 9 | 6% |
Other | 27 | 17% |
Unknown | 12 | 7% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 74 | 46% |
Biochemistry, Genetics and Molecular Biology | 35 | 22% |
Computer Science | 13 | 8% |
Medicine and Dentistry | 7 | 4% |
Immunology and Microbiology | 6 | 4% |
Other | 11 | 7% |
Unknown | 15 | 9% |