Report for: The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text

Title	The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text
Published in	PLOS ONE, June 2013
DOI	10.1371/journal.pone.0065390
Pubmed ID	23823062
Authors	Evangelos Pafilis, Sune P. Frankild, Lucia Fanini, Sarah Faulwetter, Christina Pavloudi, Aikaterini Vasileiadou, Christos Arvanitidis, Lars Juhl Jensen
Abstract	The exponential growth of the biomedical literature is making the need for efficient, accurate text-mining tools increasingly clear. The identification of named biological entities in text is a central and difficult task. We have developed an efficient algorithm and implementation of a dictionary-based approach to named entity recognition, which we here use to identify names of species and other taxa in text. The tool, SPECIES, is more than an order of magnitude faster and as accurate as existing tools. The precision and recall was assessed both on an existing gold-standard corpus and on a new corpus of 800 abstracts, which were manually annotated after the development of the tool. The corpus comprises abstracts from journals selected to represent many taxonomic groups, which gives insights into which types of organism names are hard to detect and which are easy. Finally, we have tagged organism names in the entire Medline database and developed a web resource, ORGANISMS, that makes the results accessible to the broad community of biologists. The SPECIES software is open source and can be downloaded from http://species.jensenlab.org along with dictionary files and the manually annotated gold-standard corpus. The ORGANISMS web resource can be found at http://organisms.jensenlab.org.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 3 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United Kingdom	1	33%
Unknown	2	67%

Demographic breakdown

Type	Count	As %
Scientists	2	67%
Members of the public	1	33%

Mendeley readers

The data shown below were compiled from readership statistics for 114 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Spain	3	3%
United States	3	3%
United Kingdom	2	2%
Brazil	1	<1%
Denmark	1	<1%
Germany	1	<1%
Japan	1	<1%
Mexico	1	<1%
Unknown	101	89%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	31	27%
Student > Master	24	21%
Student > Ph. D. Student	11	10%
Student > Bachelor	8	7%
Other	7	6%
Other	15	13%
Unknown	18	16%

Readers by discipline	Count	As %
Agricultural and Biological Sciences	31	27%
Computer Science	27	24%
Biochemistry, Genetics and Molecular Biology	13	11%
Medicine and Dentistry	5	4%
Immunology and Microbiology	2	2%
Other	12	11%
Unknown	24	21%

PLOS

Article Metrics

The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown