↓ Skip to main content

PLOS

Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript

Overview of attention for article published in PLOS ONE, July 2013
Altmetric Badge

Mentioned by

news
5 news outlets
blogs
3 blogs
twitter
11 X users
facebook
1 Facebook page
wikipedia
5 Wikipedia pages
googleplus
1 Google+ user
reddit
1 Redditor
video
1 YouTube creator

Citations

dimensions_citation
52 Dimensions

Readers on

mendeley
66 Mendeley
Title
Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript
Published in
PLOS ONE, July 2013
DOI 10.1371/journal.pone.0067310
Pubmed ID
Authors

Diego R. Amancio, Eduardo G. Altmann, Diego Rybski, Osvaldo N. Oliveira, Luciano da F. Costa

Abstract

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.

X Demographics

X Demographics

The data shown below were collected from the profiles of 11 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 66 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Germany 3 5%
Brazil 2 3%
Italy 1 2%
Canada 1 2%
Belarus 1 2%
Unknown 58 88%

Demographic breakdown

Readers by professional status Count As %
Researcher 14 21%
Professor > Associate Professor 10 15%
Student > Master 8 12%
Professor 6 9%
Student > Bachelor 5 8%
Other 18 27%
Unknown 5 8%
Readers by discipline Count As %
Physics and Astronomy 13 20%
Computer Science 11 17%
Social Sciences 7 11%
Agricultural and Biological Sciences 4 6%
Medicine and Dentistry 4 6%
Other 19 29%
Unknown 8 12%