High Transcriptional Complexity of the Retinitis Pigmentosa CERKL Gene in Human and Mouse

METHODS. In silico genomic and transcriptomic computational customized analysis, combined with experimental RT-PCRs on different human and murine tissues and cell lines and immunohistochemistry, have been used to characterize the transcriptional spectrum of CERKL. In the mouse retina, Cerkl is detected primarily in ganglion cells and cones but can also be observed in rods. Cerkl is mainly cytosolic. It localizes in the outer segments of photoreceptors and in the perinuclear regions of some cells.

S patiotemporal differential splicing, often related to devel- opmental events or tissue differentiation processes, affects Ͼ95% of the human genes, as recently unveiled after massive sequencing of the human transcriptome. 1,2Alternative splicing and the use of alternative promoters and transcriptional splice sites are instrumental for the generation of complexity, as proteins with different functions are encoded by the transcript variants produced.Cells can thus deploy a wide array of proteins, all arising from a single genomic sequence. 3,4isregulation of alternative splicing is often at the basis of human disease, given that distortions in the splicing process either directly alter the domains displayed by proteins or, more relevant to pathology, cause frameshifts that are frequently associated with premature stop codons. 5Therefore, prior knowledge of all the physiologically produced transcripts from a gene of interest is crucial to draw genotype-phenotype correlations in hereditary diseases and to infer the degree of pathologic severity. 6 -8This is even more relevant when considering genetic disorders of the mammalian central nervous system (CNS) and derived neurologic tissues, such as the retina, in which the highest degree of alternative splicing events occurs. 9 -11etinitis pigmentosa (RP) is a hereditary neurodegenerative disorder with extremely high genetic heterogeneity.It affects 1:4000 people worldwide, and it is the major cause of nontraumatic adult blindness. 12Although Ͼ45 genes have been identified as causative of RP (Retnet, http://www.sph.uth.tmc.edu/Retnet/), approximately 40% of the genetic cases remain unassigned, highlighting the relevance of identifying new candidates because each gene will presumably explain very few cases.4][15] These findings widen the range of molecular mechanisms underlying tissue-restricted abnormalities, decrease the number of unknown RP genes, illuminate new scenarios for tissue-specific gene function, and emphasize the need for accurate characterization of candidate splicing products, particularly because 70% of the exons in the human genome are tissue specific. 1,16ur group first identified CERKL as an RP gene 17 by detecting a homozygous nonsense mutation (R257X) that cosegregated in consanguineous Spanish families.CERKL was widely expressed, and the highest transcription levels were observed in the retina. 17,18Interestingly, the R257X mutation was embedded in an alternatively spliced exon; therefore, some of the CERKL isoforms were a priori functional in the patients. 19hese results prompted us to undertake a more accurate characterization of the CERKL transcripts in human and mouse.Our work unveils an unexpectedly high complexity of the CERKL transcripts, particularly at the 5Ј end of the gene, with alternative first exons, inclusion/exclusion of alternatively spliced exons, intron retention, and additional splice sites.

RNA Extraction and RT-PCR
For total RNA extraction, a tissue kit (High Pure RNA Tissue Kit; Roche Diagnostics, Indianapolis, IN) was used in accordance with the manufacturer's instructions.Human and mouse blood RNA was mixed (RNAlater; Ambion/Applied Biosystems, Foster City, CA) before extraction (RiboPure-Blood Kit; Ambion/Applied Biosystems).Saliva samples were treated as indicated (Oragene/RNA protocol; DNA Genotek Inc., Ontario, Canada), and RNA was extracted from human cultured cells (RNeasy kit; Qiagen, Germantown, MD).RT-PCR assays were performed for human and mouse samples (Mint Kit [Evrogen, Moscow, Russia] or Transcriptor High Fidelity cDNA Synthesis Kit [Roche Diagnostics, Indianapolis, IN]).For tissue expression analysis, all reaction mixtures (50 L) contained 10 M each primer pair, 2 M dNTPs, 1.5 mM MgCl 2 , and 1 U polymerase (GoTaq; Promega, Madison, WI).Primer localizations are depicted in Figures 1A2 (human) and 1B2 (mouse), and the sequences are given in Supplementary Table S1, http:// www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental. CERKL was amplified using primers A and B for human and a and b for mouse (120 seconds at 94°C followed by 35 cycles of 94°C for 20 seconds, 60°C for 30 seconds, and 72°C for 30/20 seconds).GAPDH was used for normalization (120 seconds at 94°C and 30 cycles of 94°C for 20 seconds and 63°C for 120 seconds).
Analysis of the 5Ј and 3Ј UTRs of human and murine CERKL retina isoforms was performed, using either the Plug adaptor or oligo-d(T) primers (provided in the Mint Kit; Evrogen) paired with suitable CERKL-specific internal primers under the indicated PCR conditions.The characterization of alternatively spliced variants and promoters was performed using a combination of the internal primers located in different exons.The primers were designed to share the same amplification conditions: 120 seconds at 94°C followed by 40 cycles of 94°C for 20 seconds, 58°C for 30 seconds, and 72°C for 90 seconds.All sequences have been submitted to GenBank (accession numbers are shown in Supplementary Tables S2A and S2B, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental).

Transfections and Recombinant Protein Expression and Immunodetection
For protein expression, HEK293T cells (2 ϫ 10 5 cells) were seeded and transfected using reagent (Lipofectamine 2000; Invitrogen Life Technologies, Carlsbad, CA), according to the manufacturer's protocol.The recombinant constructs were obtained by cloning representative human cDNA isoforms (h2, h13, h18 in Figs.1A and 4A) with and in-frame HA epitope fused at the C terminus into pcDNA 3.1 (Clontech Laboratories, Inc., Mountain View, CA).After 48 hours, cells were lysed with protein loading buffer ϫ1 and boiled for 5 minutes.Protein lysates were loaded onto 12% SDS-PAGE gels that were transferred and analyzed by Western blot.Immunodetection was performed with a primary monoclonal anti-HA (1:1000) and HRP-conjugated anti-mouse secondary antibody (1:3000).Tubulin immunodetection was used as a loading control.

Bioinformatic Analysis of the Genomic Human CERKL Locus
Most of the computational analyses were performed using the genomic sequence of the human CERKL locus at chromosome 2 (March 2006 assembly version [NCBI36/hg18)]) within the interval 182,029,864 bp to 182,259,440 bp (including the ITG4 and NEUROD1 loci), which was retrieved from the UCSC human genome browser. 21However, for the purpose of comparative genomics and to determine the conservation among human and other vertebrates (such as Macaca mulatta, Mus musculus, Gallus gallus, and Takifugu rubripes), precomputed whole genome alignments were analyzed through the VISTA UCSC browser mirror, which provides the VISTA track feature. 22The syntenic region of the mouse genome was also retrieved.BLASTN and TBLASTX alignments were performed on the syntenic sequences using the NCBI bl2seq algorithm 23 for a more in-depth comparison between human and mouse.
Previously described CERKL isoforms were retrieved from several databases: RefSeq, 24 GenBank, 25 dbTSS, 26 and VEGA. 27Some of the dbTSS transcripts were already mapped on the human CERKL genomic region at the VEGA Web site.These sequences, as well as experimentally validated CERKL cDNAs (this work), were mapped onto the analyzed sequence interval using Exonerate, 28 following the est2genome model algorithm for easier comparison of all the exonic structures from both the database and experimental evidence (complete visualization is shown in Supplementary Fig. S1 Although a track for the First-Exon-Finder program 29 on the UCSC genome browser was already available, an additional attempt was performed to predict more CpG islands, promoters, and first exons on the CERKL genomic region (cutoff value for the first-exon a posteriori probability [APP] ϭ 0.5, cutoff value for the promoter APP ϭ 0.4, and cutoff value for the promoter APP ϭ 0.4).
In addition to those generated by MEME, a new set of matrices corresponding to a selection of known transcription initiation factors (including TATA, CAAT, USF, INI, SRF, SP1, and TFIIA) was downloaded from TransFac. 31Retina-related transcription factor matrices (for PAX6, AP1, ZF5, AP2REP, AP2ALPHA, AP2GAMMA, TBP, MAZR, CRX, GATA4, SP3, ETF, KROX, WT1, NR2E3, V-MAF, and WT1) were also gathered from TransFac, Promo, 32 and Jaspar. 33All the matrices were mapped into the analyzed genomic region of CERKL using custom Perl scripts with the specific purpose of defining potential novel alternative transcription starting sites (TSS) for CERKL isoforms.The score hits on the genomic sequence were normalized between 0 and 1; then a threshold was defined as the score above 95% of the distribution for all those scores.Only hits of matrices showing a normalized score equal to or greater than the threshold were considered (a summary of those found on the 1 kbp upstream for every reported human and mouse CERKL exons that included a TSS is provided on Supplementary Tables S4A (human) and S4B (mouse), http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental.
Putative translation start sites were evaluated using the Kozak matrix 34 under the same terms.Moreover, the ENCODE H3K4Me3 track 35 on the UCSC genome browser was also considered as additional transcriptional evidence, given that histone modification correlates with transcriptionally active sites. 36The distribution of SNPs across the exons of the CERKL gene was analyzed using dbSNP31, over the hg19 database.

Comprehensive Identification of Alternatively Spliced CERKL Isoforms
Evidence of different alternatively spliced isoforms of CERKL have been reported, but a comprehensive prioritized list of the physiologically relevant transcript is still missing. 19,37Furthermore, its wide tissular expression 17,18 appears to be inconsistent with the tissue-restricted phenotype of CERKL mutations because only the retina was affected.In this case, as happens with other retina-associated disease genes, tissue-specific isoforms might have reconciled this apparent paradox. 14hus, we first aimed to exhaustively characterize the CERKL alternatively spliced isoforms generated in human and murine retinas and to perform an interspecific comparative analysis.Two different methods for the synthesis of the cDNAs (detailed description in Materials and Methods) were used to replicate the experiments, validate the sequences, and avoid technical biases.For a comprehensive isoform characterization, we performed 5Ј and 3Ј RACE reactions to identify initial and terminal UTRs on endogenously expressed retinal transcripts and subsequently used a battery of internal PCR primers (listed in Supplementary Table S1 [http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental] and located in Figs.1A2 [human] and 1B2 [mouse]) to unveil the combinatorial network of alternative promoters and exons displayed in CERKL transcripts.From these data we designed specific primers to identify fully processed transcripts encompassing the first to the last exon and thus depict the complete repertoire of CERKL aligned with the genomic primary structure as a means to validate each transcript variant.
Overall, the retinal CERKL isoforms generated by alternative splicing events showed an unexpected complexity because Ͼ20 transcripts were identified in human and mouse retinas.
The genomic organization of CERKL with the splicing events (depicted as angled lines) and 5ЈUTRs (gray boxes) identified are shown in Figures 1A (human) and 1B (mouse).The most abundant transcripts are indicated by the # symbol.For each human and mouse transcript, the 3ЈUTR was unique, although murine transcripts contained a longer 3ЈUTR than previously reported, pointing to two polyadenylation signals.Notably, in the two species, the 5Ј UTRs showed an unexpected multiplicity of TSS that contributed to the combinatorial complexity of the mature transcripts.This heterogeneity called for a rational and comprehensive nomenclature of all CERKL variants in human and mouse.Therefore, sequences from published reports, databases, and this work were gathered and systematized.Our proposal is presented in Supplementary Tables S2A  and S2B (http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental).
In detail, the analysis of the 20 fully validated human transcripts provided solid evidence of four different CERKL TSS (Fig. 1A).Eleven transcripts were expressed from the previously reported 5Ј UTR; two from the starting site of the adjacent upstream NEUROD1 gene (known to be highly expressed in the CNS and transcribed in the same direction than CERKL); six from an internal, previously unknown initiation site within exon 1 (referred to as exon 1b in the text and Supplementary material, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental); and one started from an internal sequence of exon 3 (referred to as exon 3a).Of note, the TSS of exon 1b was also supported in silico by the First-Exon-Finder, which, among other structural features, mapped a CpG island within this genomic region, and by the clustering of peaks of the H3K4Me3 track, indicative of transcriptionally active chromatin sites (Fig. 2).Yet we cannot rule out that CERKL is transcribed from unknown TSS in other tissues.In this context, the UCSC genome browser has recently incorporated an ENCODE track that corresponds to manually annotated genes, based primarily on sequenced full-length cDNAs from dbTSS plus reports from independent sources.Twelve of the 15 ENCODE CERKL variants fully overlapped with some retinal transcripts described in this work.Of the remaining three, one (OTTHUMT00000334820) started at a TSS extremely close to the reported CERKL 5Ј UTR and possibly was structurally equivalent; the other two (OTTHUMT00000334817 and OTTHUMT00000334818) started at completely different internal sites, suggesting two additional TSS.If the latter two isoforms were validated, the number of CERKL TSS in human would amount to six.
In contrast, in the murine retina, only three Cerkl start sites were experimentally identified (Fig. 1B, dark gray): 11 (of 23) fully validated transcripts started from the previously reported Cerkl site, 11 from the upstream NeuroD1 gene (as in human), and the last from the novel exon 3a, located in intron 2. The latter is also supported by the dbTSS database.Moreover, RT-PCR assays performed in a panel of several tissues provided evidence for an additional TSS within intron 2, which generated exon 3b (not found in the retina).A complete list specifying the contribution (presence or absence) of every exon in each CERKL/Cerkl isoform is presented in Supplementary Table S3, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental.
To identify the more abundant transcripts and approach their relative physiological relevance (Figs.3A [human] and 3B [mouse]), we used a battery of primers, located either at the different TSS or the alternative exons at the 5Ј of CERKL, paired with a unique reverse primer in exon 10 (human) or exon 12 (mouse).The location of the primers is indicated in Figure 3C.For isoform assignment, each amplified product was isolated and sequenced.The RT-PCRs were replicated several times.The interspecific comparison of the more abundant transcripts in the retina revealed a higher number of CERKL variants in human (8 of 20 transcripts, with a comparable level of expression) than mouse (3 of 23 transcripts, with one major variant).
Concerning the CERKL/Cerkl protein isoforms, our data reveal that the combination of TSS multiplicity with the high number of alternative splicing events affecting the first exons (exons 1-6) generates a complex pattern of mature transcripts that differ at the 5Ј end but share the 3Ј moiety (exons 6 -13), as shown in Figures 3A (human) and 3B (mouse).The alternative 5Ј exons encode the nuclear localization signals, 18,38 the putative pleckstrin homology (PH) domain, and the diacylglycerol kinase (DAGK) signatures. 17,18,38,39In addition, the human gene includes an in-frame species-specific alternative exon (4b) embedded in the predicted DAGK domain that interrupts the DAGK consensus signature.The comparison of human and mouse CERKL mature mRNAs showed that although the number of isoforms is similar, intron retention is more frequent in mouse than in human (Figs.1A2, 1B2, isoforms m9, m10, m11).These transcripts bear premature stop codons and may be candidates to be degraded by the nonsense-mediated decay mechanisms (NMD) but, if translated, would encode a C-terminal-truncated protein.

Evidence for CERKL Alternative Translational Initiation Sites
Interestingly, one of the consequences of the use of alternative TSS is that the previously reported initiation Met codon is not always included in the mature transcript.Then, additional translation initiation sites (TIS) should be considered.In silico sequence analyses using motif searches with a Kozak matrix predicted several TIS along the CERKL transcripts (Supplementary Table S5, http://www.iovs.org/lookup/suppl/doi:10.1167/ iovs.10-7101/-/DCSupplemental).Of these, only two encoded long peptide sequences, whereas the remaining putative TIS yielded a lower score value or would generate very short peptides.Initiation codons with significant TIS scores are indicated in Figures 1A2 and 1B2.For each isoform, only the longest open-reading frame starting with a high-score Met is depicted (filled boxes).
As proof of principle, we tried to express three human highly expressed isoforms (h2, h13, and h18) harboring different in-frame methionines with a high Kozak score.The h2 encompassed the complete CERKL sequence, starting at the previously described 5ЈUTR, whereas the h13 and h18 cDNAs started at different TSS.The two latter did not con-tain the first in-frame methionine in exon 1, but they both shared an in-frame Met residue at exon 5 having a high Kozak score.Of note, other out-of-frame methionines located upstream in exon 5 showed comparable Kozak values (Fig. 4A).For each construct the CERKL coding sequence was fused at the 3Ј end to an HA epitope to facilitate protein immunodetection.HEK293T cells were transfected with each construct, and RT-PCR was performed to assess the level of the recombinant CERKL transcription.Notably, we observed a high yield of the CERKL protein from 2 of 3 constructs (h2 and h18), each starting from the corresponding highlighted high Met score (Figs.4B, 4C).Indeed, the size of the expressed CERKL-HA proteins was in agreement with their expected molecular mass (60 and 32 kDa).

Exploring the Promoter Landscape of CERKL TSS
To shed light on the architecture of the CERKL promoters and to define in silico potential novel alternative TSS, we aimed to map conserved transcription factor binding sites (TFBS) on the 1-kb upstream region of every human CERKL exon.To this end, we used position weight matrices from reported general transcription initiation motifs, retina-related transcription factors, and matrices obtained by MEME after analysis of the 49 promoters of RP genes to underscore conserved retina-specific regulatory motifs (subfunctionalized MEMEs) (for a detailed description of these analyses, see Materials and Methods).The outcome of this search along the upstream sequences of every exon depicted three different scenarios that corresponded to the patterns yielded by exons with a TSS function in retina (NeuroD1, 5ЈCERKL UTR, 1b, and 3a), exons with TSS not found in the retina (corresponding to the starting exons in the ENCODE transcripts OTTHUMT00000334817 and OTTHUMT00000334818), and the remaining internal exons, which are not used as TSS (Table 1).
Notably, a more focused analysis of the target sites of retinaspecific transcription factors revealed several hits that are worth mentioning: a high-scoring hit for PAX6, right upstream exon 3, and some significant hits for CRX upstream NEUROD1.However, no hits within the 1-kb upstream region of each exon were found for NR2E3 or V-MAF (used to detect NRLbinding sites), although some were scattered along the CERKL genomic region.Overall, the evidence points to distinct promoter architecture concerning TSS, probably reflecting tissuespecific expression.Supplementary Tables S4A and S4B (http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental)show the detailed list of TFBS, MEME, and subfunctionalized MEME hits upstream of each exon.
Given that CERKL mutations also contribute to CRD, we extended the analysis to the promoters (1 kb upstream) of the CRD genes using the TRANSFAC matrices, with particular emphasis on the retina-related transcription factor.The genes were grouped according to the disease to which they contributed most: RP (already listed), CRD, and a group of genes involved in both retinal disorders.Unfortunately, no clear pattern of single/clustered-general or retina-specific-transcription factor sites emerged in any of the three gene groups (Supplementary Fig. S2, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental).

Genomic Conservation of the CERKL Region among Vertebrates
VISTA tracks on Figure 2 clearly outline evolutionary conservation of the CERKL syntenic regions among vertebrates (human, Homo sapiens; rhesus chimp, Macaca mulatta; mouse, Mus musculus; chicken, Gallus gallus; and fugu, Takifugu rubripes).The degree of sequence conservation is high, close to 100% between human and rhesus.Among  S1, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental. tetrapods, the average degree of conservation is above 70% for all exons but drops significantly in introns and intergenic regions.However, exon 4b could be an innovation in the ape lineage leading to humans because it is unique to the human genome.The comparison with fugu reveals an expected lower degree of conservation because only NEU-ROD1 exons rank above 70% whereas most CERKL exons (2, 3, 5, 7, 8, 9, 10, 11, and 12) and ITGA4 exons (3, 4, 5, 6, 7,  8, 9, 10, 11, 12, 13, 17, 19, 20, and 21) range between 50% and 70% similarity.Surprisingly, the CERKL exon 1 was among the least conserved.These results agreed with those obtained from bl2seq comparisons between human and mouse syntenic regions both at the nucleotide (BLASTN) and the translated (TBLASTX) levels.Thus far, no evidence supporting additional exons for CERKL apart from those described in this work could be obtained.

CERKL Expression in a Collection of Human and Mouse Tissues
Semiquantitative RT-PCR analysis of CERKL expression was performed in a collection of tissues and cell lines of human and mouse, with a pair of primers located in the exons shared by all isoforms (forward in exon 9 and reverse in exon 13 in human; exon 12 in mouse; see Figs. 1A2 and 1B2 for locations; details are given in Materials and Methods).The results are shown in Figures 5A and 5C (human) and 5B and 5D (mouse).At least three independent replicates were performed and quantified for each tissue.GAPDH expression was used for normalization.
In humans, the retina was by far the tissue in which CERKL expression was the highest.In fact, among the other tissues, only the brain showed some detectable expression (at levels below 10% of those in retina).Sequence analysis of the brain transcript revealed that gene expression was driven by the NEUROD1 TSS (data not shown).Of interest for future functional studies, some human cell lines showed detectable levels of CERKL transcription, as is the case with HEK293T and A549 (Fig. 5C).
In mouse, Cerkl was also highly expressed in the retina, although the liver showed even slightly higher levels of expression (Fig. 5D).Sequence analysis of the murine liver isoform (marked with an asterisk) showed that it corresponded to m30in variant.This isoform would generate a prematurely truncated protein because it retained a noncoding fragment of intron 11.Other mouse tissues, such as testis and spleen, also showed high to moderate levels of Cerkl expression.
As mentioned, in addition to the reported mouse Cerkl promoter (heretofore, UTR), retinal transcripts were produced from the NeuroD1 promoter and an internal TSS in intron 2 (named 3a).Direct sequencing of RT-PCRs from other tissues led us to identify another TSS, 3b, also within intron 2. We performed RT-PCR assays to assess the relative contribution of these TSS in the retina: UTR, NeuroD1, 3a and 3b to Cerkl expression (Fig. 6).Tissular comparison showed a wide range of expression from each TSS: the Cerkl UTR contribution was indeed major in the retina, moderate in the kidney, faint in the brain, and undetectable in the blood and spleen.In addition, in  agreement with previous reports, NeuroD1-driven expression was tissue restricted and was observed only in the retina in our panel.In contrast, the 3a TSS-driven transcript was expressed more widely but showed very low levels in the retina.Although the 3b TSS was silent in the mouse retina, it was the most active in the liver (Fig. 6).Isoforms m24 to m28 in Figure 1B, which started at either 3a or 3b TSS, were isolated and sequenced in the spleen and liver but were undetectable in the retina.Of note, in some tissues, the RT-PCRs specific for these four promoters did not explain the total Cerkl transcriptional levels (as revealed by the amplification of the 9 to 12 exon region common to all isoforms), again pointing to additional TSS.

Cerkl Localization in Mouse Retina by Immunohistochemistry
Previous results based on in situ mRNA hybridization showed that Cerkl was expressed mainly in the ganglion cell layer, though a fainter level of expression was detected in other retinal layers, including photoreceptors. 17To accurately assess the localization of the Cerkl protein in the retina, fluorescent immunohistochemistry using different cell-specific antibodies and markers was performed on serial sagittal cryosections of adult mouse retinas (2 months old).An in-house rabbit polyclonal anti-Cerkl antibody raised against an exon 2 peptide sequence was affinity purified and preabsorbed before use.Double coimmunodetection with this polyclonal anti-Cerkl antibody and either anti-rhodopsin (specific for rods) or anti-PKC␣ (which primarily labels bipolar cells and rods), plus counterstaining with DAPI (nuclei) and Alexa Fluor 647-conjugated PNA (which labels cones) were performed in parallel to allow a more detailed localization (Fig. 7).Cerkl expression was found at the ganglion cell layer (GCL), in the photoreceptors (PhR), and in some cell bodies at the outer nuclear layer (ONL) and inner nuclear layer (INL) (Fig. 7).Magnification of the photoreceptor cell layer showed a strong immunodetection of Cerkl in cones and, faintly, in rods.Of interest, Cerkl localized primarily in the outer segments of both types of photoreceptors, as shown by its colocalization with rhodopsin (rods) and cone (Figs.7H, 7I) staining.In addition, Cerkl showed perinuclear staining in some cell bodies at the ONL, extremely close to the photoreceptor layers, probably corresponding to cones (Figs.7I, 7J, white arrows).Concerning other neuronal retinal types, Cerkl was detected in a population of bipolar cells (white arrowheads in Fig. 7N) as well as in other cell types at the INL, as yet undetermined.

DISCUSSION
One of the major breakthroughs from interspecific sequence comparisons of whole genomes is that the complexity of a particular organism depends not only on the number of genes but also on the diversity of the proteins produced and the regulation of transcription.An increasing amount of evidence FIGURE 5. CERKL semiquantitative expression analysis in human and mouse tissues.CERKL expression identified by RT-PCR in several tissues and cell lines of human (A) and mouse (B) origin.Semiquantitative analysis of all CERKL transcripts in human (C) and mouse (D).At least three replicates were performed.GAPDH expression was used for normalization.Maximum CERKL levels were arbitrarily set as 100% (retina in human, liver in mouse).CERKL was amplified using primers A and B in human and primers a and b in mouse, as located in Figures 1A2 and 1B2.The amplicon size is indicated in each case.The asterisk in the murine liver sample corresponds to the alternative isoform m30in.Primer sequences are provided in Supplementary Table S1, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental. Notably, the primers used for the amplification of CERKL transcript were located in the common region at 3Ј of the gene; therefore, the bands observed are the result of the transcripts produced from all TSS in each tissue.FIGURE 6. Tissue-specific Cerkl promoter in adult mice.RT-PCRs were performed on several murine samples to determine the active promoters in each tissue.Forty-five cycle amplifications were carried out using the same reverse oligonucleotide in exon 12 and different forward primers located in each TSS identified (NeuroD1 UTR, Cerkl UTR, 3a, and 3b) as well as exon 9 to amplify the common region.Gapdh was used to normalize between samples.Primer location is depicted in Figure 1B, and sequences are listed in Supplementary Table S1, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental. in the human genome supports that alternative splicing is more the rule than the exception because Ͼ95% of the multiexon genes undergo alternative splicing events, often related to developmental or tissue differentiation processes and differential physiological functions.Many bioinformatic efforts are now being devoted to decipher "the splicing code," which is intended to characterize the regulatory splicing strategies on a genomewide scale to predict the specific transcripts from every gene. 40,41However, these in silico predictions must be substantiated in vivo to identify the physiologically relevant isoforms, their regulation, and eventually their contribution to disease.Within this framework, we have combined both in vivo and in silico approaches to analyze the expression of CERKL, a retinitis pigmentosa gene of an as yet unknown function.Our data show unexpectedly high transcriptional complexity in human and mouse tissues arising from the combination of tissue-specific promoters and alternative splicing events, particularly in the retina.A large multiplicity of retina transcripts has also been reported for other genes, such as RPGR, RPGRIP1, and CPEB3 [42][43][44] .In agreement with these results, a recent accurate transcriptional characterization focused on the PRPF gene family (proteins associated with spliceosome formation and responsible for retinal dystrophies) showed that the processed pre-mRNA levels were highest in the retina than in other tissues and organs.Their results pointed to a particularly increased splicing activity at the base of the high multiplicity of retinal transcripts and called for sophisticated quality control mechanisms. 45his high repertoire of CERKL transcript and protein isoforms suggests distinct roles for the alternatively displayed domains.The first two exons of CERKL encode a PH domain and two nuclear localization signals, whereas exons 3 to 7 encompass the DAGK domain [17][18][19]38,39 (Fig. 8). Notaby, the use of the different promoters and 5ЈUTRs affects the inclu-  sion/exclusion of the first exons in the final transcript and generates variability at the N-terminal peptide moiety, with a potential impact in the protein function, which supports a finely tuned regulation of the 5Ј splicing events.In contrast, the exons encoding the C-terminal domains are maintained in all isoforms, even in the transcripts from nonretinal tissues, arguing in favor of a basic function.
The comparison between human and mouse retina major CERKL isoforms reveals higher complexity for the human transcripts.In fact, the most abundant isoforms are species-specific (except for h2 and m1, which are structurally equivalent).For example, the NeuroD1 promoter contributes to the highly expressed isoforms in mouse, whereas its relevance in the human adult retina appears to be minor.This holds true for the least abundant isoforms (e.g., h1, h12, h15-h17 and m5, m7, m8, m10, m11, m13) (Fig. 1A2, 1B2).Interspecific differences in the levels of expression and identification of species-specific isoforms have also been reported for other visual disorder genes, such as IMPDH1, OPA1, and PRPF31, suggesting distinct functional requirements for each species. 46 -48Remarkably, one-third of the murine isoforms (12 of 32) compared with 1 of 21 human isoforms are produced by missplicing (with partial retention of intron sequences).Most of these misspliced transcripts would encode a truncated protein, unless degraded by NMD.Other reports analyzing human versus murine transcripts identified other retinal dystrophy genes with preferential or unique intron retention in the mouse, among them RPGR (intron 14), 49 RPGRIP1 (intron 13), 42 and PRPF31 (intron 7). 48If extended to other genes, these results would argue in favor of either a more precise splicing machinery or a less permissive mRNA integrity control, in human, at least in retina; even though it has been shown that relevant splicing events associated with NMD remain conserved through mammalian genomes, reflecting a common clearance mechanism of transcripts that might compromise cell viability. 50ne of the relevant findings of our work is the use of tissue-specific TSS in mouse.Among the tissues analyzed, the NeuroD1 promoter was only active in the retina, where the reported Cerkl UTR promoter also showed the highest transcriptional activity.Instead, the additional alternative internal promoters were highly expressed in nonneuronal tissues (Fig. 6, liver, testis, kidney).The combination of different promoters and shared splicing events in both species hindered isoform quantification by real-time RT-PCR (which relies on small probes) to evaluate their contribution to the CERKL transcript population.Thus, a relative quantification by specific amplification of each isoform was performed (Figs. 3, 5, 6).Of note, the retina is the tissue in which higher expression and greater display of CERKL transcript variability is observed.In addition to the multiplicity of promoters and alternative splicing events, another layer of complexity is provided by other in-frame methionines, which direct the synthesis of shortened CERKL protein isoforms, with a downstream start in exon 5.In vitro experimental evidence strongly supports this starting Met in the h18 isoform, though no expression could be detected for the h13 variant.Whether this apparent discrepancy could be explained by other upstream out-of-frame methionines in h13 (not present on the h18 alternative 5ЈUTR exons) that affect the translational initiation complex formation remains to be elucidated (Fig. 4).Although the CERKL function is as yet undetermined, such a high repertoire of transcripts and proteins, while making functional assignment a real challenge, hints at a very crucial role in retinal cell survival.Indeed, in a more general view, these results open new scenarios for the human proteome complexity associated with a multiplicity of isoforms.
The in silico analysis of binding sites for transcription initiation factors across the CERKL genomic neighborhood (approximately 230 kbp) revealed a high number of hits (Ͼ15,000).However, they were not randomly distributed but were clustered just upstream of ITGA4, NEUROD1, and CERKL canonical TSS.If we focus on the retina-specific TFBS, no significant scores for OTX2 or NR2E3 could be found upstream of the promoters of these genes.In contrast, binding sites for CRX, PAX6, and NRL upstream of NEUROD1 TSS, for NRL in CERKL exon 1b, and for PAX6 and CRX upstream of exon 3 TSS were identified.These results provide evidence for retinaspecific regulatory enhancers close to CERKL.Overall, the differential patterns observed for the in silico predicted enhancers, the TSS experimentally confirmed in the retina, and the identification of nonretinal transcriptional products clearly support a highly tuned, tissue-specific regulation of CERKL expression.
Notably, Cerkl immunohistochemistry showed high expression in cones and moderate expression in rods, ganglion cells, and other retinal INL cell types.A specific perinuclear staining was observed at the INL and ONL.Hitherto, CERKL mutations have been associated with both conventional RP and CRD.Regarding this clinical heterogeneity, our findings of expression in cones and rods are consistent with the two clinical entities but also highlight the need to establish a more accurate scenario.Therefore, full characterization of the transcriptional map of isoforms, the type and location of the mutations, the accurate subcellular localization of proteins, and the action of modifier genes is required to comprehend the contribution of CERKL/CERKL variants to retinal degeneration disorders.
To establish a more precise relationship between mutations and the relative pathogenicity of each isoform, the distribution of SNPs along the coding gene structure was analyzed in silico.A priori, a homogeneous distribution of both mutations and SNPs should be expected when all the exons and encoded domains contribute equally to function.The results of this analysis showed that not all the domains harbored the same frequency of SNPs because some showed higher SNP frequencies than the observed average, whereas others were devoid of polymorphic variants, thus suggesting differential selection pressures (Supplementary Fig. S3, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental).For instance, the alternately spliced exons that encompassed the DAGK domain contained fewer SNPs, whereas the exons that encoded the pleckstrin homology domain showed more SNPs than average.The biological meaning of this differential distribution remains to be assessed.
Meanwhile, as more mutations are being identified, a genotype-phenotype correlation pattern is emerging (Fig. 8, Table 2).The first pathogenic variant described, p.R257X-a nonsense homozygous mutation in exon 5-generates a truncated protein that abrogates the putative DAGK domain.Interestingly, only 1 of the 8 major isoforms remains unaffected after alternative splicing.The phenotype associated with this variant ranges from canonical RP to more severe CRD features. 53Another RP-associated mutation, p.R106S, is localized in 1 of the 2 putative nuclear localization signals, probably compromising its import and function in the nucleus. 55However, all other protein domains remain unaltered, in accordance with a moderate RP phenotype.Other alleles are associated with more severe retinal disorders, with clear cone-rod dystrophy features and early macular degeneration.One of them, c.238ϩ1GϾA, 37 affects the splicing of the first intron, abrogating the generation of the putative protein isoforms produced from exon 1 and 1b.Thus, only the isoforms starting in exon 3 or the spliced variants of exon 1a would be produced.The other mutation, p.C125W 54  * The isoform cDNA used for reference is NM_201548.4,corresponding to isoform h2 (Fig. 1A).NC, not considered due to heterozygosity.† Only the phenotype for homozygous allelic combination is considered of value in genotype-phenotype correlations.NC, not considered.‡ The reported mutation p.R283X, considered as novel in this study, corresponds to the already reported p.R257X variant (this difference is due to the isoform cDNA taken as reference-NM_001030311.2 and NM_201548.4,respectively).
onine in exon 1), changes an evolutionarily conserved cysteine residue of the pleckstrin domain.Three other clearly pathogenic alleles, two frameshifts (by indels) and a nonsense mutation, have also been reported, but their association with particular features is hindered by their compound heterozygous status.Indeed, this is an ongoing task.Our comprehensive approach, by characterizing a high number of isoforms expressed in a single tissue, provides an exhaustive transcriptional picture on a hitherto fragmentary collection of data and builds a framework to assess the severity of new mutations.Considering the high number of CERKL isoforms, undertaking accurate analysis for localization or functional specificity, or both, at the subcellular level remains a key challenge to understand the contribution of this gene to retinal degeneration.

FIGURE 1 .
FIGURE 1. Alternately spliced CERKL isoforms in human and mouse retina.Extremely high complexity of the splicing events in human (A1) and mouse (B1) CERKL transcripts.Open boxes: exons.Filled boxes: retained introns or cryptic noncoding exons.Angled lines above and below the gene structure indicate validated splicing events.Scheme depicting all the human (A2) and mouse (B2) spliced variants observed in the retina.Exons are indicated as boxes and the coding sequence (CDS) for each isoform, considering the higher likelihood of first methionine, is shown in black.Dark gray: TSS found in retina.Light gray: nonretinal TSS.# Main isoforms in each species.Arrows: letters indicate the position and direction of the primers used for PCR reactions (complete list and sequence in Supplementary TableS1, http://www.iovs.org/lookup/suppl/doi:10.1167/ iovs.10-7101/-/DCSupplemental).^Nonretinal isoforms found in mouse liver and spleen.The scores of the Kozak's motif hits containing putative TIS methionines for human are: ૺ 12.003; OE 5.248; f 8.389; ࡗ 5.281; F 8.852.For mouse they are: 13.384; E 9.620; *10.662; छ 8.389; ¤ 8.863 (the complete list of all Kozak's scores are contained in Supplementary TableS5, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental).

FIGURE 2 .
FIGURE 2. Summary of annotated and custom feature tracks on the UCSC genome browser.(A) An overall view of the whole genomic neighborhood of human CERKL, including upstream NEUROD1 (ITGA4 downstream gene is shown in Supplementary Fig. S1, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental).Homology to various species, including mouse, is depicted on the topmost tracks.Exonic structure of all the experimentally validated CERKL isoforms described in this article.FirstExonFinder predicted TSS; the ENCODE histone track H3K4Me3, a custom track of hits to different position weight matrices for known and predicted transcription factor binding sites, and some further evidence of transcriptional activity on neural tissues are shown.(B, C) Magnifications of the regions around exons 1 and 3, respectively, containing a more detailed view of the TFBS sites.The same track distribution is depicted on all three panels.Matrix hits overlapping homopolymer stretches larger than 5 bp were discarded.

FIGURE 3 .
FIGURE 3. Evaluation of CERKL main transcripts.RT-PCR from human (A) and mouse (B) retina total RNA, to identify the main isoforms.(C) Scheme depicting the structure of CERKL in human and mouse, with the location of the primers used to generate the PCR reactions.For the sake of clarity, exons not relevant to this assay are not shown.For all amplicons, the same reverse oligonucleotide (human: O; mouse: b) was used, paired with the corresponding forward primers.For human: lane 1, C; lane 2, D; lane 3, E; lane 4, F; lane 5, G; lane 6, I; lane 7, J; lane 8, K.For mouse: lane 1, f; lane 2, g; lane 3, d; lane 4, h; lane 5, I; lane 6, j; lane 7, e; lane 8, k.Primer sequences are provided in the Supplementary TableS1, http://www.iovs.org/lookup/suppl/doi:10.1167/iovs.10-7101/-/DCSupplemental.

FIGURE 4 .
FIGURE 4. Evidence for additional initiating methionines in alternatively spliced human CERKL isoforms.(A) Diagram of the three different HA-tagged constructs from isoforms h2, h8, and h13, as well as the structure and molecular mass of the predicted encoded proteins.Methionines showing high Kozak scores are indicated by an asterisk (methionine in exon 1) and a filled triangle (internal methionine in exon 5), whereas other out-of-frame significantly scored Met are marked with a cross.Filled boxes: putative CDS.(B) RT-PCR showing expression of the CERKL constructs in transfected HEK293T cells.Lower endogenous CERKL levels were also detected in nontransfected cells (Ø).GAPDH gene was used for normalization.(C) CERKL-HA-fused proteins were immunodetected with an anti-HA monoclonal antibody.␣-Tubulin was used as a loading control.

FIGURE 7 .
FIGURE 7. Immunohistochemistry on mouse retina cryosections.(A-J) Localization of Cerkl in photoreceptor cells.Nuclei are stained with DAPI (blue, A); Cerkl (B) and Rhodopsin (C) proteins are detected in green and magenta, respectively; cones appear in red (D) using PNA staining.Two merged images (E, F) and the magnification of some sections show clear localization of Cerkl in cones (yellow, G) and, more faintly, in rods, colocalizing with rhodopsin (H).Although Cerkl localizes mainly in the outer segments, some perinuclear staining could be also observed in the nuclei of the cones at the ONL, indicated by white arrows (I, J). (J) DAPI counterstaining of the nuclei.(K-N) Expression of Cerkl in other retinal layers.Nuclei are stained with DAPI (blue, K), Cerkl protein is detected in green (L), bipolar cells and rods expressing PKC␣ are labeled in red (M).Cerkl is expressed in the ganglion cells (GCL), some cells in the INL and ONL, and in the photoreceptors.The merged image (N) shows expression of Cerkl in some bipolar cells (white arrowheads) while confirming localization in rods.Scale bars show magnifications.

FIGURE 8 .
FIGURE 8. Scheme of the reported causative mutations on the CERKL gene.The location of the mutations identified thus far is shown on a diagram of the CERKL protein.The CERKL domains described by either sequence homology (PH, pleckstrin; DAGK, diacylglycerol kinase domain) or functional analysis (NLS, nuclear localization signals; NES, nuclear export signals) are also depicted.

TABLE 1 .
Distribution of Motifs among 1 kbp Upstream of Every CERKL Exon Showed a Differential Pattern, Depending on the Kind of Exon