Bitacora: A comprehensive tool for the identification and annotation of gene families in genome assemblies

dc.contributor.authorVizueta Moraga, Joel
dc.contributor.authorSánchez-Gracia, Alejandro
dc.contributor.authorRozas Liras, Julio A.
dc.date.accessioned2023-03-08T15:13:46Z
dc.date.available2023-03-08T15:13:46Z
dc.date.issued2020-09
dc.date.updated2023-03-08T15:13:46Z
dc.description.abstractGene annotation is a critical bottleneck in genomic research, especially for the comprehensive study of very large gene families in the genomes of nonmodel organisms. Despite the recent progress in automatic methods, state‐of‐the‐art tools used for this task often produce inaccurate annotations, such as fused, chimeric, partial or even completely absent gene models for many family copies, errors that require considerable extra efforts to be corrected. Here we present bitacora , a bioinformatics solution that integrates popular sequence similarity‐based search tools and Perl scripts to facilitate both the curation of these inaccurate annotations and the identification of previously undetected gene family copies directly in genomic DNA sequences. We tested the performance of bitacora in annotating the members of two chemosensory gene families with different repertoire size in seven available genome sequences, and compared its performance with that of augustus‐ppx , a tool also designed to improve automatic annotations using a sequence similarity‐based approach. Despite the relatively high fragmentation of some of these drafts, bitacora was able to improve the annotation of many members of these families and detected thousands of new chemoreceptors encoded in genome sequences. The program creates general feature format (GFF) files, with both curated and newly identified gene models, and FASTA files with the predicted proteins. These outputs can be easily integrated in genomic annotation editors, greatly facilitating subsequent manual annotation and downstream evolutionary analyses.
dc.format.extent8 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec703000
dc.identifier.issn1755-098X
dc.identifier.urihttps://hdl.handle.net/2445/194886
dc.language.isoeng
dc.publisherJohn Wiley & Sons
dc.relation.isformatofVersió postprint del document publicat a: https://doi.org/10.1111/1755-0998.13202
dc.relation.ispartofMolecular Ecology Resources, 2020, vol. 20, num. 5, p. 1445-1452
dc.relation.urihttps://doi.org/10.1111/1755-0998.13202
dc.rights(c) John Wiley & Sons, 2020
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.sourceArticles publicats en revistes (Genètica, Microbiologia i Estadística)
dc.subject.classificationGenòmica
dc.subject.classificationEvolució molecular
dc.subject.otherGenomics
dc.subject.otherMolecular evolution
dc.titleBitacora: A comprehensive tool for the identification and annotation of gene families in genome assemblies
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/acceptedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
703000.pdf
Mida:
280.69 KB
Format:
Adobe Portable Document Format