Report for: Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines

Title	Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
Published in	PLOS ONE, November 2012
DOI	10.1371/journal.pone.0049110
Pubmed ID	23152858
Authors	Matthew Frampton, Richard Houlston
Abstract	Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 20 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	2	10%
France	2	10%
Germany	2	10%
Norway	1	5%
Peru	1	5%
Australia	1	5%
Canada	1	5%
United Kingdom	1	5%
Montenegro	1	5%
Other	1	5%
Unknown	7	35%

Demographic breakdown

Type	Count	As %
Members of the public	12	60%
Scientists	7	35%
Science communicators (journalists, bloggers, editors)	1	5%

Mendeley readers

The data shown below were compiled from readership statistics for 161 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United States	7	4%
United Kingdom	2	1%
Spain	2	1%
Switzerland	1	<1%
France	1	<1%
India	1	<1%
Germany	1	<1%
Colombia	1	<1%
Italy	1	<1%
Other	0	0%
Unknown	144	89%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	48	30%
Student > Ph. D. Student	31	19%
Student > Master	24	15%
Other	10	6%
Student > Bachelor	9	6%
Other	27	17%
Unknown	12	7%

Readers by discipline	Count	As %
Agricultural and Biological Sciences	74	46%
Biochemistry, Genetics and Molecular Biology	35	22%
Computer Science	13	8%
Medicine and Dentistry	7	4%
Immunology and Microbiology	6	4%
Other	11	7%
Unknown	15	9%

PLOS

Article Metrics

Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown