An assessment of gene prediction accuracy in large DNA sequences

Guigó, Roderic; Agarwal, Pankaj; Abril Ferrando, Josep Francesc; Burset Albareda, Moisès; Fickett, J.W.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/192700

Full metadata record

DC Field	Value	Language
dc.contributor.author	Guigó, Roderic	-
dc.contributor.author	Agarwal, Pankaj	-
dc.contributor.author	Abril Ferrando, Josep Francesc, 1970-	-
dc.contributor.author	Burset Albareda, Moisès	-
dc.contributor.author	Fickett, J.W.	-
dc.date.accessioned	2023-01-27T07:13:28Z	-
dc.date.available	2023-01-27T07:13:28Z	-
dc.date.issued	2000	-
dc.identifier.issn	1088-9051	-
dc.identifier.uri	https://hdl.handle.net/2445/192700	-
dc.description.abstract	One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the ∼200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy ofGENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE,PROCRUSTES, andBLASTX, was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.	-
dc.format.extent	13 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	Cold Spring Harbor Laboratory Press	-
dc.relation.isformatof	Reproducció del document publicat a: https://doi.org/10.1101/gr.143200	-
dc.relation.ispartof	Genome Research, 2000, vol. 10, num. 10, p. 1631-1642	-
dc.relation.uri	https://doi.org/10.1101/gr.143200	-
dc.rights	cc-by-nc (c) Guigó, Roderic et al., 2000	-
dc.rights.uri	https://creativecommons.org/licenses/by-nc/4.0/	-
dc.source	Articles publicats en revistes (Genètica, Microbiologia i Estadística)	-
dc.subject.classification	ADN	-
dc.subject.classification	Genoma humà	-
dc.subject.classification	Seqüència de nucleòtids	-
dc.subject.classification	Genètica humana	-
dc.subject.other	DNA	-
dc.subject.other	Human genome	-
dc.subject.other	Nucleotide sequence	-
dc.subject.other	Human genetics	-
dc.title	An assessment of gene prediction accuracy in large DNA sequences	-
dc.type	info:eu-repo/semantics/article	-
dc.type	info:eu-repo/semantics/publishedVersion	-
dc.identifier.idgrec	555701	-
dc.date.updated	2023-01-27T07:13:28Z	-
dc.rights.accessRights	info:eu-repo/semantics/openAccess	-
Appears in Collections:	Articles publicats en revistes (Genètica, Microbiologia i Estadística)

Files in This Item:

File	Description	Size	Format
555701.pdf		3.51 MB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License