↓ Skip to main content

PLOS

A Practical Approach to Language Complexity: A Wikipedia Case Study

Overview of attention for article published in PLOS ONE, November 2012
Altmetric Badge

Mentioned by

twitter
26 X users
wikipedia
1 Wikipedia page
googleplus
1 Google+ user

Citations

dimensions_citation
50 Dimensions

Readers on

mendeley
79 Mendeley
citeulike
5 CiteULike
Title
A Practical Approach to Language Complexity: A Wikipedia Case Study
Published in
PLOS ONE, November 2012
DOI 10.1371/journal.pone.0048386
Pubmed ID
Authors

Taha Yasseri, András Kornai, János Kertész

Abstract

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.

X Demographics

X Demographics

The data shown below were collected from the profiles of 26 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 79 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 4 5%
Netherlands 2 3%
United Kingdom 2 3%
Germany 1 1%
Brazil 1 1%
Canada 1 1%
Italy 1 1%
Japan 1 1%
Spain 1 1%
Other 2 3%
Unknown 63 80%

Demographic breakdown

Readers by professional status Count As %
Student > Master 16 20%
Researcher 15 19%
Student > Ph. D. Student 12 15%
Other 6 8%
Student > Doctoral Student 5 6%
Other 16 20%
Unknown 9 11%
Readers by discipline Count As %
Computer Science 20 25%
Linguistics 13 16%
Physics and Astronomy 10 13%
Social Sciences 7 9%
Medicine and Dentistry 4 5%
Other 16 20%
Unknown 9 11%