Report for: How to Get the Most out of Your Curation Effort

Title	How to Get the Most out of Your Curation Effort
Published in	PLoS Computational Biology, May 2009
DOI	10.1371/journal.pcbi.1000391
Pubmed ID	19461884
Authors	Andrey Rzhetsky, Hagit Shatkay, W. John Wilbur
Abstract	Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Germany	1	50%
Unknown	1	50%

Demographic breakdown

Type	Count	As %
Members of the public	1	50%
Scientists	1	50%

Mendeley readers

The data shown below were compiled from readership statistics for 80 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United States	8	10%
Germany	2	3%
United Kingdom	2	3%
France	2	3%
Mexico	2	3%
Norway	1	1%
Sweden	1	1%
New Zealand	1	1%
Portugal	1	1%
Other	2	3%
Unknown	58	73%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	27	34%
Student > Ph. D. Student	15	19%
Student > Master	8	10%
Professor > Associate Professor	6	8%
Student > Bachelor	5	6%
Other	13	16%
Unknown	6	8%

Readers by discipline	Count	As %
Agricultural and Biological Sciences	36	45%
Computer Science	24	30%
Medicine and Dentistry	3	4%
Psychology	2	3%
Business, Management and Accounting	1	1%
Other	8	10%
Unknown	6	8%

PLOS

Article Metrics

How to Get the Most out of Your Curation Effort

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown