Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/229341
A comprehensive annotation of conserved protein domains in human endogenous retroviruses
Journal Title
Director/Tutor
Journal ISSN
Volume Title
Related resource
Abstract
Human endogenous retroviruses (HERVs) occupy nearly 8% of the human genome, yet their protein-coding potential remains largely unexplored. Originating from ancestral retroviruses that infected germline cells, HERVs typically follow the canonical proviral structure LTR-gag-pol-env-LTR, where gag, pol, and env encode structural, enzymatic, and envelope proteins. We present a comprehensive resource annotating conserved retroviral domains across 120 000 + ORFs derived from internal HERV regions. Using a reproducible pipeline based on HMMER and InterProScan, we identified over 17 000 domain hits-primarily from pol genes such as reverse transcriptase, RNase H, and protease-and quantified their structural conservation. Hundreds of domains exceed 95% alignment coverage, revealing a surprising abundance of full-length retrovirus-like domains in both young and ancient families. The HERVK (HML-2) subfamily retains the most complete polyprotein architecture, including 13 loci with nearly intact Gag, Pol, and Env, but full-length Pol domains are also found in HERVH, HERVW, and HERVE. Our annotations recover conserved catalytic motifs in Pol and transmembrane features in Env, enabling fine-grained functional interpretation. All results-including BED, FASTA, domain sequences, InterProScan outputs, and transmembrane predictions-are provided as an open resource at Zenodo to support downstream analyses of HERV protein expression, immune modulation, and co-option in health and disease.
Subject
Subject (English)
Citation
Citation
MONTSERRAT-AYUSO, Tomàs, PUJOL, Aurora and ESTEVE-CODINA, Anna. A comprehensive annotation of conserved protein domains in human endogenous retroviruses. NAR Genomics and Bioinformatics. 2026. Vol. 8, num. 1. [consulted: 15 of June of 2026]. Available at: https://hdl.handle.net/2445/229341