Title |
Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches
|
---|---|
Published in |
PLOS ONE, March 2011
|
DOI | 10.1371/journal.pone.0018029 |
Pubmed ID | |
Authors |
Kevin W. Boyack, David Newman, Russell J. Duhon, Richard Klavans, Michael Patek, Joseph R. Biberstine, Bob Schijvenaars, André Skupin, Nianli Ma, Katy Börner |
Abstract |
We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Australia | 1 | 50% |
Unknown | 1 | 50% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 2 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 6 | 3% |
Netherlands | 3 | 1% |
Germany | 3 | 1% |
Canada | 3 | 1% |
United States | 3 | 1% |
Australia | 2 | <1% |
Sweden | 2 | <1% |
France | 2 | <1% |
Denmark | 2 | <1% |
Other | 11 | 5% |
Unknown | 201 | 84% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 53 | 22% |
Student > Ph. D. Student | 37 | 16% |
Other | 23 | 10% |
Student > Master | 20 | 8% |
Student > Doctoral Student | 15 | 6% |
Other | 55 | 23% |
Unknown | 35 | 15% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 71 | 30% |
Social Sciences | 28 | 12% |
Agricultural and Biological Sciences | 22 | 9% |
Business, Management and Accounting | 13 | 5% |
Medicine and Dentistry | 11 | 5% |
Other | 47 | 20% |
Unknown | 46 | 19% |