Title |
Optimality Driven Nearest Centroid Classification from Genomic Data
|
---|---|
Published in |
PLOS ONE, October 2007
|
DOI | 10.1371/journal.pone.0001002 |
Pubmed ID | |
Authors |
Alan R. Dabney, John D. Storey |
Abstract |
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers. |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 5 | 9% |
United Kingdom | 1 | 2% |
Spain | 1 | 2% |
Brazil | 1 | 2% |
Unknown | 45 | 85% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 18 | 34% |
Student > Ph. D. Student | 8 | 15% |
Professor > Associate Professor | 6 | 11% |
Student > Bachelor | 5 | 9% |
Student > Doctoral Student | 4 | 8% |
Other | 9 | 17% |
Unknown | 3 | 6% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 13 | 25% |
Computer Science | 9 | 17% |
Mathematics | 7 | 13% |
Medicine and Dentistry | 3 | 6% |
Engineering | 3 | 6% |
Other | 13 | 25% |
Unknown | 5 | 9% |