↓ Skip to main content

PLOS

Wrangling Phosphoproteomic Data to Elucidate Cancer Signaling Pathways

Overview of attention for article published in PLOS ONE, January 2013
Altmetric Badge

Mentioned by

facebook
1 Facebook page

Citations

dimensions_citation
19 Dimensions

Readers on

mendeley
65 Mendeley
citeulike
2 CiteULike
Title
Wrangling Phosphoproteomic Data to Elucidate Cancer Signaling Pathways
Published in
PLOS ONE, January 2013
DOI 10.1371/journal.pone.0052884
Pubmed ID
Authors

Mark L. Grimes, Wan-Jui Lee, Laurens van der Maaten, Paul Shannon

Abstract

The interpretation of biological data sets is essential for generating hypotheses that guide research, yet modern methods of global analysis challenge our ability to discern meaningful patterns and then convey results in a way that can be easily appreciated. Proteomic data is especially challenging because mass spectrometry detectors often miss peptides in complex samples, resulting in sparsely populated data sets. Using the R programming language and techniques from the field of pattern recognition, we have devised methods to resolve and evaluate clusters of proteins related by their pattern of expression in different samples in proteomic data sets. We examined tyrosine phosphoproteomic data from lung cancer samples. We calculated dissimilarities between the proteins based on Pearson or Spearman correlations and on Euclidean distances, whilst dealing with large amounts of missing data. The dissimilarities were then used as feature vectors in clustering and visualization algorithms. The quality of the clusterings and visualizations were evaluated internally based on the primary data and externally based on gene ontology and protein interaction networks. The results show that t-distributed stochastic neighbor embedding (t-SNE) followed by minimum spanning tree methods groups sparse proteomic data into meaningful clusters more effectively than other methods such as k-means and classical multidimensional scaling. Furthermore, our results show that using a combination of Spearman correlation and Euclidean distance as a dissimilarity representation increases the resolution of clusters. Our analyses show that many clusters contain one or more tyrosine kinases and include known effectors as well as proteins with no known interactions. Visualizing these clusters as networks elucidated previously unknown tyrosine kinase signal transduction pathways that drive cancer. Our approach can be applied to other data types, and can be easily adopted because open source software packages are employed.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 65 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United Kingdom 2 3%
Canada 1 2%
Unknown 62 95%

Demographic breakdown

Readers by professional status Count As %
Researcher 18 28%
Student > Ph. D. Student 15 23%
Student > Doctoral Student 3 5%
Student > Master 3 5%
Student > Bachelor 2 3%
Other 8 12%
Unknown 16 25%
Readers by discipline Count As %
Agricultural and Biological Sciences 27 42%
Biochemistry, Genetics and Molecular Biology 10 15%
Computer Science 3 5%
Engineering 2 3%
Chemistry 2 3%
Other 6 9%
Unknown 15 23%