Modeling the structure of written text

dc.contributor.authorSerrano Moral, Ma. Ángeles (María Ángeles)
dc.contributor.authorFlammini, Alessandro
dc.contributor.authorMenczer, Filippo
dc.date.accessioned2016-02-15T12:42:18Z
dc.date.available2016-02-15T12:42:18Z
dc.date.issued2009-04-29
dc.date.updated2016-02-15T12:42:18Z
dc.description.abstractWritten text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics.
dc.format.extent8 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec567489
dc.identifier.issn1932-6203
dc.identifier.urihttps://hdl.handle.net/2445/69418
dc.language.isoeng
dc.publisherPublic Library of Science (PLoS)
dc.relation.isformatofReproducció del document publicat a: http://dx.doi.org/10.1371/journal.pone.0005372
dc.relation.ispartofPLoS One, 2009, vol. 4, p. e5372
dc.relation.urihttp://dx.doi.org/10.1371/journal.pone.0005372
dc.rightscc-by (c) Serrano Moral, Ma. Ángeles (María Ángeles) et al., 2009
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es
dc.sourceArticles publicats en revistes (Física de la Matèria Condensada)
dc.subject.classificationComunicació escrita
dc.subject.classificationAlgorismes
dc.subject.classificationEstadística
dc.subject.otherWritten communication
dc.subject.otherAlgorithms
dc.subject.otherStatistics
dc.titleModeling the structure of written text
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
567489.pdf
Mida:
583.85 KB
Format:
Adobe Portable Document Format