↓ Skip to main content

PLOS

Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics

Overview of attention for article published in PLoS Computational Biology, February 2013
Altmetric Badge

Mentioned by

blogs
1 blog
twitter
27 X users
facebook
1 Facebook page
wikipedia
1 Wikipedia page

Readers on

mendeley
290 Mendeley
citeulike
12 CiteULike
Title
Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics
Published in
PLoS Computational Biology, February 2013
DOI 10.1371/journal.pcbi.1002854
Pubmed ID
Authors

Hamish Cunningham, Valentin Tablan, Angus Roberts, Kalina Bontcheva

Abstract

This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.

X Demographics

X Demographics

The data shown below were collected from the profiles of 27 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 290 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Spain 3 1%
Brazil 3 1%
Germany 2 <1%
United Kingdom 2 <1%
Portugal 2 <1%
United States 2 <1%
New Zealand 2 <1%
Australia 1 <1%
India 1 <1%
Other 3 1%
Unknown 269 93%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 60 21%
Researcher 51 18%
Student > Master 38 13%
Student > Bachelor 23 8%
Other 20 7%
Other 62 21%
Unknown 36 12%
Readers by discipline Count As %
Computer Science 99 34%
Agricultural and Biological Sciences 34 12%
Medicine and Dentistry 25 9%
Biochemistry, Genetics and Molecular Biology 12 4%
Psychology 12 4%
Other 51 18%
Unknown 57 20%