↓ Skip to main content

PLOS

The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text

Overview of attention for article published in PLOS ONE, June 2013
Altmetric Badge

Mentioned by

twitter
3 X users

Citations

dimensions_citation
146 Dimensions

Readers on

mendeley
114 Mendeley
citeulike
3 CiteULike
Title
The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text
Published in
PLOS ONE, June 2013
DOI 10.1371/journal.pone.0065390
Pubmed ID
Authors

Evangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, Lars Juhl Jensen

Abstract

The exponential growth of the biomedical literature is making the need for efficient, accurate text-mining tools increasingly clear. The identification of named biological entities in text is a central and difficult task. We have developed an efficient algorithm and implementation of a dictionary-based approach to named entity recognition, which we here use to identify names of species and other taxa in text. The tool, SPECIES, is more than an order of magnitude faster and as accurate as existing tools. The precision and recall was assessed both on an existing gold-standard corpus and on a new corpus of 800 abstracts, which were manually annotated after the development of the tool. The corpus comprises abstracts from journals selected to represent many taxonomic groups, which gives insights into which types of organism names are hard to detect and which are easy. Finally, we have tagged organism names in the entire Medline database and developed a web resource, ORGANISMS, that makes the results accessible to the broad community of biologists. The SPECIES software is open source and can be downloaded from http://species.jensenlab.org along with dictionary files and the manually annotated gold-standard corpus. The ORGANISMS web resource can be found at http://organisms.jensenlab.org.

X Demographics

X Demographics

The data shown below were collected from the profiles of 3 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 114 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Spain 3 3%
United States 3 3%
United Kingdom 2 2%
Brazil 1 <1%
Denmark 1 <1%
Germany 1 <1%
Japan 1 <1%
Mexico 1 <1%
Unknown 101 89%

Demographic breakdown

Readers by professional status Count As %
Researcher 31 27%
Student > Master 24 21%
Student > Ph. D. Student 11 10%
Student > Bachelor 8 7%
Other 7 6%
Other 15 13%
Unknown 18 16%
Readers by discipline Count As %
Agricultural and Biological Sciences 31 27%
Computer Science 27 24%
Biochemistry, Genetics and Molecular Biology 13 11%
Medicine and Dentistry 5 4%
Immunology and Microbiology 2 2%
Other 12 11%
Unknown 24 21%