A comprehensive annotation of conserved protein domains in human endogenous retroviruses

dc.contributor.authorMontserrat-ayuso, Tomàs
dc.contributor.authorPujol, Aurora
dc.contributor.authorEsteve-codina, Anna
dc.date.accessioned2026-05-06T09:38:43Z
dc.date.available2026-05-06T09:38:43Z
dc.date.issued2026-01-06
dc.date.updated2026-02-25T10:14:22Z
dc.description.abstractHuman endogenous retroviruses (HERVs) occupy nearly 8% of the human genome, yet their protein-coding potential remains largely unexplored. Originating from ancestral retroviruses that infected germline cells, HERVs typically follow the canonical proviral structure LTR-gag-pol-env-LTR, where gag, pol, and env encode structural, enzymatic, and envelope proteins. We present a comprehensive resource annotating conserved retroviral domains across 120 000 + ORFs derived from internal HERV regions. Using a reproducible pipeline based on HMMER and InterProScan, we identified over 17 000 domain hits-primarily from pol genes such as reverse transcriptase, RNase H, and protease-and quantified their structural conservation. Hundreds of domains exceed 95% alignment coverage, revealing a surprising abundance of full-length retrovirus-like domains in both young and ancient families. The HERVK (HML-2) subfamily retains the most complete polyprotein architecture, including 13 loci with nearly intact Gag, Pol, and Env, but full-length Pol domains are also found in HERVH, HERVW, and HERVE. Our annotations recover conserved catalytic motifs in Pol and transmembrane features in Env, enabling fine-grained functional interpretation. All results-including BED, FASTA, domain sequences, InterProScan outputs, and transmembrane predictions-are provided as an open resource at Zenodo to support downstream analyses of HERV protein expression, immune modulation, and co-option in health and disease.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2445/229341
dc.language.isoeng
dc.publisherOxford University Press (OUP)
dc.relation.isformatofReproducció del document publicat a: https://doi.org/10.1093/nargab/lqag013
dc.relation.ispartofNAR Genomics and Bioinformatics, 2026, vol. 8, issue. 1
dc.relation.urihttps://doi.org/10.1093/nargab/lqag013
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.sourceArticles publicats en revistes (Institut d'lnvestigació Biomèdica de Bellvitge (IDIBELL))
dc.titleA comprehensive annotation of conserved protein domains in human endogenous retroviruses
dc.typeinfo:eu-repo/semantics/article

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
lqag013.pdf
Mida:
1.13 MB
Format:
Adobe Portable Document Format