A comprehensive annotation of conserved protein domains in human endogenous retroviruses
| dc.contributor.author | Montserrat-ayuso, Tomàs | |
| dc.contributor.author | Pujol, Aurora | |
| dc.contributor.author | Esteve-codina, Anna | |
| dc.date.accessioned | 2026-05-06T09:38:43Z | |
| dc.date.available | 2026-05-06T09:38:43Z | |
| dc.date.issued | 2026-01-06 | |
| dc.date.updated | 2026-02-25T10:14:22Z | |
| dc.description.abstract | Human endogenous retroviruses (HERVs) occupy nearly 8% of the human genome, yet their protein-coding potential remains largely unexplored. Originating from ancestral retroviruses that infected germline cells, HERVs typically follow the canonical proviral structure LTR-gag-pol-env-LTR, where gag, pol, and env encode structural, enzymatic, and envelope proteins. We present a comprehensive resource annotating conserved retroviral domains across 120 000 + ORFs derived from internal HERV regions. Using a reproducible pipeline based on HMMER and InterProScan, we identified over 17 000 domain hits-primarily from pol genes such as reverse transcriptase, RNase H, and protease-and quantified their structural conservation. Hundreds of domains exceed 95% alignment coverage, revealing a surprising abundance of full-length retrovirus-like domains in both young and ancient families. The HERVK (HML-2) subfamily retains the most complete polyprotein architecture, including 13 loci with nearly intact Gag, Pol, and Env, but full-length Pol domains are also found in HERVH, HERVW, and HERVE. Our annotations recover conserved catalytic motifs in Pol and transmembrane features in Env, enabling fine-grained functional interpretation. All results-including BED, FASTA, domain sequences, InterProScan outputs, and transmembrane predictions-are provided as an open resource at Zenodo to support downstream analyses of HERV protein expression, immune modulation, and co-option in health and disease. | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.uri | https://hdl.handle.net/2445/229341 | |
| dc.language.iso | eng | |
| dc.publisher | Oxford University Press (OUP) | |
| dc.relation.isformatof | Reproducció del document publicat a: https://doi.org/10.1093/nargab/lqag013 | |
| dc.relation.ispartof | NAR Genomics and Bioinformatics, 2026, vol. 8, issue. 1 | |
| dc.relation.uri | https://doi.org/10.1093/nargab/lqag013 | |
| dc.rights.accessRights | info:eu-repo/semantics/openAccess | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | |
| dc.source | Articles publicats en revistes (Institut d'lnvestigació Biomèdica de Bellvitge (IDIBELL)) | |
| dc.title | A comprehensive annotation of conserved protein domains in human endogenous retroviruses | |
| dc.type | info:eu-repo/semantics/article |
Fitxers
Paquet original
1 - 1 de 1