Report for: Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge

Title	Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge
Published in	PLOS ONE, July 2013
DOI	10.1371/journal.pone.0068141
Pubmed ID	23874524
Authors	Sara Mostafavi, Alexis Battle, Xiaowei Zhu, Alexander E. Urban, Douglas Levinson, Stephen B. Montgomery, Daphne Koller
Abstract	Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 18 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	7	39%
France	2	11%
Montenegro	1	6%
Canada	1	6%
Germany	1	6%
China	1	6%
United Kingdom	1	6%
Unknown	4	22%

Demographic breakdown

Type	Count	As %
Scientists	11	61%
Members of the public	6	33%
Science communicators (journalists, bloggers, editors)	1	6%

Mendeley readers

The data shown below were compiled from readership statistics for 176 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United States	11	6%
Norway	1	<1%
Germany	1	<1%
Spain	1	<1%
Slovenia	1	<1%
Unknown	161	91%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	64	36%
Researcher	37	21%
Student > Master	17	10%
Student > Bachelor	14	8%
Professor > Associate Professor	9	5%
Other	26	15%
Unknown	9	5%

Readers by discipline	Count	As %
Agricultural and Biological Sciences	81	46%
Biochemistry, Genetics and Molecular Biology	34	19%
Computer Science	22	13%
Medicine and Dentistry	8	5%
Mathematics	5	3%
Other	12	7%
Unknown	14	8%

PLOS

Article Metrics

Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge

Mentioned by

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown