↓ Skip to main content

PLOS

A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data

Overview of attention for article published in PLoS Computational Biology, August 2014
Altmetric Badge

Mentioned by

blogs
2 blogs
twitter
21 X users
facebook
1 Facebook page

Citations

dimensions_citation
29 Dimensions

Readers on

mendeley
83 Mendeley
citeulike
2 CiteULike
Title
A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data
Published in
PLoS Computational Biology, August 2014
DOI 10.1371/journal.pcbi.1003737
Pubmed ID
Authors

Yuan Zhang, Yanni Sun, James R. Cole

Abstract

Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material.

X Demographics

X Demographics

The data shown below were collected from the profiles of 21 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 83 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 3 4%
Germany 2 2%
Netherlands 1 1%
Taiwan 1 1%
Brazil 1 1%
Japan 1 1%
Spain 1 1%
Unknown 73 88%

Demographic breakdown

Readers by professional status Count As %
Researcher 27 33%
Student > Master 13 16%
Student > Ph. D. Student 12 14%
Student > Bachelor 9 11%
Other 6 7%
Other 9 11%
Unknown 7 8%
Readers by discipline Count As %
Agricultural and Biological Sciences 45 54%
Biochemistry, Genetics and Molecular Biology 18 22%
Computer Science 3 4%
Medicine and Dentistry 3 4%
Environmental Science 2 2%
Other 2 2%
Unknown 10 12%