Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/49363
Title: Plagiarism meets paraphrasing: insights for the new generation in automatic plagiarism detection
Author: Barrón-Cedeño, Alberto
Vila Rigat, Marta
Martí Antonin, M. Antònia
Rosso, Paolo
Keywords: Plagi
Paràfrasi
Lingüística computacional
Tractament del llenguatge natural (Informàtica)
Plagiarism
Paraphrase
Computational linguistics
Natural language processing (Computer science)
Issue Date: 1-Dec-2013
Publisher: The MIT Press
Abstract: Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.
Note: Reproducció del document publicat a: http://dx.doi.org/10.1162/COLI_a_00153
It is part of: Computational Linguistics, 2013, vol. 39, num. 4, p. 917-947
URI: http://hdl.handle.net/2445/49363
Related resource: http://dx.doi.org/10.1162/COLI_a_00153
ISSN: 0891-2017
Appears in Collections:Articles publicats en revistes (Filologia Catalana i Lingüística General)

Files in This Item:
File Description SizeFormat 
619558.pdf384.95 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.