↓ Skip to main content

PLOS

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

Overview of attention for article published in PLOS ONE, March 2011
Altmetric Badge

Mentioned by

news
1 news outlet
blogs
4 blogs
twitter
2 X users
patent
1 patent
facebook
1 Facebook page

Citations

dimensions_citation
235 Dimensions

Readers on

mendeley
238 Mendeley
citeulike
16 CiteULike
connotea
1 Connotea
Title
Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches
Published in
PLOS ONE, March 2011
DOI 10.1371/journal.pone.0018029
Pubmed ID
Authors

Kevin W. Boyack, David Newman, Russell J. Duhon, Richard Klavans, Michael Patek, Joseph R. Biberstine, Bob Schijvenaars, André Skupin, Nianli Ma, Katy Börner

Abstract

We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 238 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United Kingdom 6 3%
Netherlands 3 1%
Germany 3 1%
Canada 3 1%
United States 3 1%
Australia 2 <1%
Sweden 2 <1%
France 2 <1%
Denmark 2 <1%
Other 11 5%
Unknown 201 84%

Demographic breakdown

Readers by professional status Count As %
Researcher 53 22%
Student > Ph. D. Student 37 16%
Other 23 10%
Student > Master 20 8%
Student > Doctoral Student 15 6%
Other 55 23%
Unknown 35 15%
Readers by discipline Count As %
Computer Science 71 30%
Social Sciences 28 12%
Agricultural and Biological Sciences 22 9%
Business, Management and Accounting 13 5%
Medicine and Dentistry 11 5%
Other 47 20%
Unknown 46 19%