Plagiarism meets paraphrasing: insights for the new generation in automatic plagiarism detection

dc.contributor.authorBarrón-Cedeño, Alberto
dc.contributor.authorVila Rigat, Marta
dc.contributor.authorMartí Antonin, M. Antònia
dc.contributor.authorRosso, Paolo
dc.date.accessioned2014-02-04T08:41:54Z
dc.date.available2014-03-01T23:02:07Z
dc.date.issued2013-12-01
dc.date.updated2014-02-03T16:45:38Z
dc.description.abstractAlthough paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.
dc.format.extent23 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec619558
dc.identifier.issn0891-2017
dc.identifier.urihttps://hdl.handle.net/2445/49363
dc.language.isoeng
dc.publisherThe MIT Press
dc.relation.isformatofReproducció del document publicat a: http://dx.doi.org/10.1162/COLI_a_00153
dc.relation.ispartofComputational Linguistics, 2013, vol. 39, num. 4, p. 917-947
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/FP7/269180/EU//WIQ-EI
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/FP7/246016/EU//ABCDE
dc.relation.urihttp://dx.doi.org/10.1162/COLI_a_00153
dc.rights(c) Association for Computational Linguistics, 2013
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.sourceArticles publicats en revistes (Filologia Catalana i Lingüística General)
dc.subject.classificationPlagi
dc.subject.classificationParàfrasi
dc.subject.classificationLingüística computacional
dc.subject.classificationTractament del llenguatge natural (Informàtica)
dc.subject.otherPlagiarism
dc.subject.otherParaphrase
dc.subject.otherComputational linguistics
dc.subject.otherNatural language processing (Computer science)
dc.titlePlagiarism meets paraphrasing: insights for the new generation in automatic plagiarism detectioneng
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
619558.pdf
Mida:
384.95 KB
Format:
Adobe Portable Document Format