Validation of Suspected Somatic Single Nucleotide Variations in the Brain of Alzheimer’s Disease Patients

Next-generation sequencing techniques and genome-wide association study analyses have provided a huge amount of data, thereby enabling the identification of DNA variations and mutations related to disease pathogenesis. New techniques and software tools have been developed to improve the accuracy and reliability of this identification. Most of these tools have been designed to discover and validate single nucleotide variants (SNVs). However, in addition to germ-line mutations, human tissues bear genomic mosaicism, which implies that somatic events are present only in low percentages of cells within a given tissue, thereby hindering the validation of these variations using standard genetic tools. Here we propose a new method to validate some of these somatic mutations. We combine a recently developed software with a method that cuts DNA by using restriction enzymes at the sites of the variation. The non-cleaved molecules, which bear the SNV, can then be amplified and sequenced using Sanger’s technique. This procedure, which allows the detection of alternative alleles present in as few as 10% of cells, could be of value for the identification and validation of low frequency somatic events in a variety of tissues and diseases.


INTRODUCTION
A small number of Alzheimer's disease (AD) cases are caused by an inherited mutation in one of three genes (APP, PSEN1, and PSEN2) [1].However, the vast majority of patients have the sporadic form of the disease, which is related to the presence of several genetic and non-genetic risk factors [2].Since somatic mutations in single human neurons have been identified [3,4], it has been proposed that somatic gene variations present in brain cells facilitate the development of the disease [5,6].These variations can arise during development or aging, as in the case of other disorders such as cancer, and they may represent mosaic genomic heterogeneity [7].
It is difficult to confirm the detection of somatic gene variations by standard procedures like Sanger's method when the number of cells bearing the particular gene variation or mutation is very low compared with the rest of the cells.Indeed, Sanger sequencing allows the detection of somatic gene variations with an allelic frequency of 20% or higher [8] in a given sample.In this regard, other techniques, such as Illumina sequencing, have been tested [6].However, this method can introduce errors in reading sequencing alignments that can interfere with the identification of true somatic variations [9,10].In an attempt to decrease these errors, several approaches have emerged [5,10,11].However, it continues to be difficult to apply Sanger's sequencing for validation purposes when variations are present in only a few cells or when comparing samples from two different tissues from the same individual and only one bears the somatic gene variation.
Here we first developed a new method involving Illumina sequencing and validation that allows the characterization of somatic variations present in a few brain cells and absent in other tissues like blood, of a single subject.It is difficult to accurately identify a somatic SNV in the exome of a specific cell in an environment containing many cells lacking that particular SNV.New methods have recently been proposed [5,[10][11][12][13][14], and new software, like Virmid [15], has been developed to favor the detection of somatic mutations without impurity interferences and to avoid errors caused by non-suitable read alignments.In this way, the main aim of this work is the development of a new method for the analysis of somatic variants.
In the present study, we describe a method to identify exomic brain-specific somatic SNVs by Illumina sequencing.The method is based on the following: a) use of Virmid software [15] for data processing; b) removal of the DNA lacking the SNV of interest by digestion with a specific restriction nuclease; and c) amplification and validation of the DNA containing the SNV by Sanger sequencing.This amplification is achieved by treating the DNA samples with specific restriction endonucleases that recognize only those sequences in which the SNV of interest is absent.These samples are then cleaved.Finally, the DNA bearing the SNV is amplified and subjected to Sanger sequencing.
A limitation of this procedure is that the restriction nuclease used to eliminate bulk DNA may not always be available; however, when we performed the nuclease step, there was good correlation between the bioinformatics processing and the data obtained by Sanger sequencing after nuclease treatment.This finding indicates that this method is suitable to identify brain SNVs that occur in neurodegenerative disorders like AD, and, in general terms, to identify the SNVs present at low proportions in a particular tissue.

MATERIALS AND METHODS
Samples from donors were obtained from the Spanish Brain Bank (Banco de Tejidos CIEN [BT-CIEN], http://bt.fundacioncien.es/)and from the Biobanco del Sistema Sanitario Público de Andalucía (http://www.juntadeandalucia.es/salud/biobanco/).When alive, donors gave their written informed consent.The tissues were obtained using protocols approved by the ethical committee of the two aforementioned organizations.Our work was previously approved by the ethical committee of our center (Comité de Ética de la Investigación conjunto CNB-CBMSO, http://www.cnb.csic.es/cei/).The methods were carried out in accordance with the approved guidelines.DNA was extracted using Qiagen kits and following the manufacturer's instructions.
The list of donors is indicated in Supplementary Table 1.

Tissue sample preparation
Hippocampal tissues from donors were obtained as follows: postmortem tissues were obtained through a This table shows some of the additional somatic variations that we discovered in some hippocampus samples, but not in blood, using Virmid software.All of the mutations in this table are in genes that have hippocampus-specific SNVs, as indicated previously [6] and a restriction site for a restriction endonuclease, in such way that the presence of the mutation modifies that site.The table indicates the position of the mutation (according to hg19 assembly of the genome), the gene in which it is found, ID in dbSNP (none of them were previously identified in blood DNA samples), the amino-acid change caused, the enzyme that recognizes this site, and the percentage of reads with the mutation rapid pathological autopsy shortly after death.The postmortem interval was 3 h.Immediately after the autopsy, the fresh tissues were flash-frozen in isopentane at -50 • C. Thereafter, each frozen tissue sample was placed in a -80 • C freezer for long-term preservation.Frozen tissue samples from various brain regions were obtained from the corresponding slices after 2-h of temperature soothing.Each sample was obtained with the aid of sterile disposable material and placed into sterile cryo-tubes.Thereafter, the samples were kept at -80 • C. Blood samples were obtained simultaneously with routine blood extraction.

DNA isolation
All genomic DNA samples were isolated from blood or from the hippocampus, using Qiagen kits (DNeasy Blood and Tissue, ref: 69504) and following the manufacturer's instructions.
Sample processing for exome sequencing 3.10 -6 g of genomic DNA was fragmented to an average size of 200 bp using a Covaris LE220 instrument.Short insert libraries were obtained using the Illumina TruSeq DNA Sample Preparation Kit.Exonic sequences were enriched using Nimble-Gen Sequence Capture Human Exome 2.1M Array.Paired-end sequences of 91 nucleotides from each end were generated to an average of 50x coverage using an Illumina HiSeq 2000 instrument.Sequences were generated in fastaq format.

Oligonucleotides used for DNA amplification (PCR)
The oligonucleotides listed in Table 2 were used for sequence amplification purposes.A concentration from 900 fM to 900 pM was used (see Fig. 6).DNA for PCR was purified with Wizard ® Plus SV Minipreps DNA Purification System (Ref A1330), following the manufacturer's guidelines.

Amplification of fragments bearing SNVs
For the experiment shown in Fig. 6, a modified version of TruePrime™ technology from Sygnis™ was used as an alternative to amplify and specifically enrich the samples containing the SNV of interest.TruePrime™ is a multiple displacement amplification (MDA) technology based on the combination of 29 DNA polymerase [16,17] and TthPrimPol, a DNA primase that synthesizes DNA primers for 29 DNA polymerase during the reaction, thus allowing the amplification of very low concentrations of DNA (Picher et al., submitted).The undigested DNA fragments (100 pg) containing the SNV were denatured (10 min at 95 • C) and sealed (60 min at 60 • C) to form circular single-stranded DNA molecules using Epicentre ® CircLigase™ II (100 units in a final reaction volume of 20 l).Next, these circular DNA molecules were amplified by TruePrime™ rolling circle amplification in the presence of the hexanucleotide (oligomer) (5'-ACCAAT-3') containing the motif for nuclease EC01091 (5'-ACCAAG-3').In order to specifically enrich the amplification products with molecules bearing the SNV of interest, the G in this nuclease was changed to T-the SNV found in the AD patient.An oligonucleotide complementary to the reverse strand found in the AD patient was also added (5'-GGGTCA-3').Briefly, the circularized DNA fragments (10 pg in 2.5 l) were first denatured by adding 2.5 l of buffer D and incubating 3 min at room temperature.The samples were then neutralized by adding 2.5 l of buffer N. The amplification mix containing 26.8 l of H 2 O, 5 l of reaction buffer, 5 l of dNTPs, 5 l of Enzyme 1 (TthPrimPol), and 0.7 l of Enzyme 2 (ø29DNApol) was added to the DNA samples, resulting in a final reaction volume of 50 l.When indicated, 90 pM of each oligo (5'-ACCAAG-3' and 5'-GGGTCA-3') was added to the reaction.Reaction mixtures were incubated for 3 h at 30 • C, and ø29DNApol was inactivated for 10 min at 65 • C to avoid degradation of the amplification products.The amount of amplified DNA obtained was approximately 1 g, as determined by the Quant-iT™ PicoGreen ® dsDNA Assay Kit (Invitrogen, Life Technologies, Carlsbad, CA, USA).In some cases, the mixture was incubated (1 h) in the presence of 29 DNA polymerase and the absence of TimePrime™ polymerase, followed by a 2-h incubation in the presence of both polymerases.

Sanger sequencing
Sanger sequencing for purified PCR products was performed using Applied Biosciences (ABI) 3720xl sequencers at GATC Biotech (Cologne, Germany).

Bioinformatics analysis
We proceeded in two different ways depending on whether our goal was to obtain well-defined SNVs with a homozygous or heterozygous genotype, wherein each is present 100% (if homozygous) or 50% (if heterozygous) of the cells, or whether the SNV is present in somatic mutations, where in each variation it is present at less than 50%.

Detection of SNVs
Samples were aligned to the human reference genome version hg19 [18] using the BWA aligner software [19] with default parameters.For each case, all the samples were pre-processed using Picard software to remove duplicate reads (http://picard.sourceforge.net/).Local realignment was performed around indels to improve SNV calling in these conflictive areas (IndelRealigner from the Genome Analyzer Toolkit, GATK, version 2.1-8 [20]).Base quality scores were recalibrated using BaseRecalibrator from GATK.The UnifiedGenotyper algorithm from GATK was then used with default parameters (see [21,22] for details) to call SNVs, and a first file including raw calls was obtained.We then separated the indels from the rest of the calls and considered only SNVs for the analysis.These variants were filtered with VariantFiltration (from GATK) using the following parameters: coverage: DP > 10, DP > 20, DP > 50 or DP > 100, depending on the case of study; QD < 2.0; FS > 60.0, MQ < 35.0;HaplotypeScore > 13.0; MQRankSum < -12.5 and ReadPosRankSum < -8.0.We selected only calls that passed these filters.Variants were annotated using the dbSNP database version 138 [23], the UCSC human RefGene [24], and the snpEFF software (version 2 0 5) [25].In order to manipulate the files containing variations and to determine how many of these variations were unique or common to different tissues, we used the VCFtools software [26].

Detection of somatic mutations
To detect somatic mutations, we used hippocampus and blood samples from the same individual in each case and analyzed them together.We proceeded with the recalibration of the bases with BaseRecalibrator from GATK.After obtaining the recalibrated files, we used the Virmid software [15].This algorithm is used for SNP profiling in paired control-disease samples with default parameters (taking, in our case, hippocampus as diseased tissue and blood as control).The mutations were obtained in VCF format and were also annotated using the snpEFF software (version 2 0 5) [25].

Validation of the analysis on a previously described SNV present in a high proportion of the tissue analyzed
As a first step, we used our method on a sample containing a previously described specific SNV.Using Genome Analysis Tool Kit (GATK) [20] software and following the recommended workflows for variant analysis proposed by the developers of this software, we previously identified among our samples [6] a SNV present in heterozygosis and in homozygosis (dbSNP ID: rs513873 A−→G) in position chr11 : 56510623 (according to GRCh37 assembly of human genome) in OR9G4 (olfactory receptor 9G4 gene, Fig. 1A).Using Virmid software, we confirmed the presence of the SNV (Fig. 1A) and then validated this finding by Sanger sequencing (Fig. 1B).[2] : rs513873 A−→G), in position chr11 : 56510623 (according to GRCh37 assembly of human genome) was detected.In order to test the efficiency of the Illumina-Virmid method, we sequenced three samples, with and without the variation, as described previously, in such way that A/A is homozygous for the reference allele and it does not have the SNV, G/G is homozygous for the described SNV, and A/G is heterozygous, having both the reference allele and the SNV respectively in each chromosome 11.The figure shows the alignments of the processed reads in the flanking regions of this SNV.As can be seen, each sequence shows a concrete number of reads with the reference allele and/or with the alternative allele (corresponding to the described SNV), thereby confirming its genotype (A/A, A/G, or G/G).B) This figure shows the chromatograms resulting from the Sanger sequencing analysis of the fragments containing (or not) the previously described SNV.The results obtained by Illumina sequencing described in Fig. 1A and the genotypes of the samples are corroborated here.The arrows indicate the exact site where the SNV is found in the sequence, pointing to a single peak for the homozygous samples (A/A and G/G) and a double peak for the heterozygous ones (A/G).
To amplify a fragment containing the SNV site, we designed two flanking DNA primers (see methods) to obtain a short DNA fragment (718 bp) that included this SNV.The single exon of OR9G4 holds the AGTACT sequence, which, after the SNV, becomes AGTGCT.The first sequence (double stranded) 5'AGTACT3'; 3'TCATGA5' was cleaved by restriction enzyme Sca1, which cuts the 5'AGT/ACT3' site (Fig. 2A).After treatment of total DNA with the restriction enzyme, we found that only those DNA molecules lacking the SNV were digested.As a control, treatment with the enzyme of homozygous DNA for the SNV and for a control sample lacking it confirmed the previous results (Fig. 2B).This finding shows that a specific restriction nuclease can be used to cleave specific DNA sequences with or without the SNV.

Detection of low frequency brain-specific SNVs
We next tested whether the above procedure was useful to detect SNVs present less than 50% of the cells.As a proof of concept, we used the previously PCR-amplified DNA fragments of 718 bp corresponding to samples A/A (without the SNV and homozygous for the reference allele) and G/G (with the described SNV in both alleles and containing the target sequence for the restriction enzyme Sca1).We mixed these two DNA samples, which contain a different nucleotide in their sequences, at ratios of 1 : 1, 3 : 1, 5 : 1, and 10 : 1 (Fig. 3A) and repeated the procedure described above.The reference DNA sample (A), present in a higher number of cells, was a suitable target for the restriction nuclease Sca1 (AGTACT), but not the DNA sample with the SNV (AGTGCT).To remove the reference DNA sequence, we digested the DNA mixture with Sca1.Next, we amplified and purified the undigested DNA and characterized the presence of the SNV by Sanger sequencing (Fig. 3B).Our results indicated that the method was effective to detect variants present in a 1/10 proportion.The main problem encountered was the low recovery yield of purified DNA after electrophoresis.

Identification of somatic brain mutations present in AD patients
We next sought to validate SNVs present in only a few cells of brain tissue.In a previous study on AD brains, we demonstrated the presence of somatic SNVs in this tissue.These were found to be absent in paired blood DNA samples.These analyses were done by Illumina sequencing [6].Here, using Virmid software, we validated two of these low frequency SNVs (Table 1).We also demonstrate that the presence of these SNVs prevented DNA digestion by specific nuclease restriction enzymes, thereby resulting in the removal of bulk DNA lacking the SNVs.These features may allow the use of the method shown in Supplementary Figure 1 for low frequency SNVs, detected by Illumina, by using standard Sanger sequencing.
From the SNVs shown in Table 1, we chose the exome of a patient with a SNV in the COL3A1 gene in position chr2 : 189853334 G−→T (see Fig. 4A).After the bioinformatics processing (Virmid), we found that this SNV was present in brain (9% of total reads) and absent in blood tissue.Direct Sanger sequencing of brain (Fig. 4B) and blood (Fig. 4C) samples showed the absence of this SNV in both blood and brain DNA.Since the presence of such a low frequency SNV prevents DNA digestion by nuclease Eco0109I, we digested brain DNA with this nuclease to remove bulk DNA.
Figure 5 shows the scheme, indicating that when the whole DNA (containing many molecules lacking the SNV and a few molecules containing it) was digested with Eco01091, only those molecules of undigested DNA were isolated.These were later purified by gel electrophoresis, amplified by PCR, and sequenced by Sanger's method.However, we found that DNA recovery after gel electrophoresis was very low, thus hindering proper amplification of the residual DNA and the obtention of unequivocal data.

Specific amplification of fragments bearing the SNV of interest
First, a rolling circle replication approach was designed to amplify only the circularized fragments containing the SNV of interest.This was achieved by combining 29 DNA polymerase with a specific hexanucleotide containing the nuclease motif for EC01091 (5'-ACCAAG-3'), in which G was changed to T-the somatic SNV to be detected.However, the amount of DNA produced was insufficient for DNA sequencing using Sanger's method.Thus, we used an alternative approach (TruePrime™), based on 29 DNA polymerase and TthPrimPol (Picher et al., submitted), to specifically and exponentially amplify this very low amount of DNA.The circularized DNA fragments containing the SNV were amplified by rolling circle replication using TruePrime™ (see Methods) in the presence of specific hexanucleotides, Fig. 2. Scheme showing the procedure developed to validate SNVs using restriction enzymes.A) The single nucleotide variation A−→G was detected in such way that its presence modifies the recognition site for the restriction enzyme Sca1, thereby preventing its DNA cut.We designed two specific DNA primers to amplify a short region (718 bp) covering the site of this SNV in the genome.This fragment results in two small fragments (460bp + 258bp) when it does not contain the SNV or in a simple non-cleaved fragment (718 bp) when the SNV is present.B) This image shows the result of cutting the PCR-amplified fragments (see Fig. 1B) for the restriction enzyme ScaI.Samples were loaded in a 1% agarose gel after treatment with Sca1.Observe that, as indicated in 1B, the presence of the SNV prevents the enzyme from cutting the site.In A/A, the entire original 718 bp fragment has been digested, resulting in two fragments of 460 + 256 bp, G/G is homozygous for the SNV, so the cut has not occurred.Finally, in A/G, which is heterozygous for the SNV, there is a mix of the two previous results, with one half of its DNA digested and the other not.Fig. 3. Method to amplify a sample bearing a SNV that is present in a low proportion.A) Two samples containing a short 718 bp PCRamplified DNA fragment having G/G or A/A, an SNV whose presence is crucial for the recognition of the restriction enzyme Sca1 (see Fig. 1), were mixed in different ratios (1 : 1, 3 : 1, 5 : 1, and 10 : 1), in such way that the sample in a lower proportion was G/G, thus containing the SNV, and subsequently it was not recognized by the restriction enzyme Sca1.All the mixed samples were treated with this enzyme and loaded in an 1% agarose gel.It can be observed that the band at 718 bp matches proportionally in intensity the ratio of the sample of DNA with the SNV.B) The undigested band at 718 bp was recovered from the gel and sequenced by Sanger's method.Observe that this uncut band contains the described SNV. in order to enrich the amplification products with molecules bearing the SNV of interest (Fig. 6A).We recovered at least 1 g of amplified DNA per sample.
When this amplified DNA was sequenced by Sanger's method (Fig. 6C), we detected the described SNV, although it generally appeared together with some DNA lacking the variant.This observation could be attributed to the presence of a low proportion of undigested DNA lacking the SNV or to the presence of a low amount of DNA from other contaminant cells that result in a background of other nucleotide sequences.This background was more evident when brain tissue was analyzed, as we consistently achieved cleaner sequences from blood cells.Our results show that, by using specific DNA digestion, the above procedure allows the detection of low frequency SNVs (see also Figs. 1-3).In more detail, two variables for the amplification, namely oligonucleotide concentration and order for using (incubation) the 29 DNA polymerase and TruePrime™ polymerase, were taken into account.In the first case, increasing concentrations, from 900 fM to 900 pM of oligos, were used.In the second case, all the components of the amplification mixture were added at the same time (method A) whereas for Method B, all the components of the amplification mixture (including 29 DNA polymerase), except TruePrime™ polymerase, were added for a first incubation (1 h -30 • C).Afterwards, TruePrime™ polymerase was added for a further incubation of the whole mixture (2 h -30 • C) (see Fig. 6A).
Figure 6B shows the results obtained for different samples amplified with distinct oligo concentrations and by using Method A or Method B. The (Sanger's) sequences obtained are shown in Fig. 6C.
In brain samples, SNVs present in less than 9-10% of total reads may be (Fig. 6) at the detection limit for validation by this method.Indeed, using the Virmid software, we observed the presence of specific SNVs in the other gene shown in Table 1 (see also Supplementary Figure 1).We tested the method described in Fig. 7 with the SNV found in position chr4 : 151835421 G−→A of LRB.This SNV was present at a low frequency in brain (6% of total reads) and was not detected by standard Sanger sequencing (Supplementary Figure 2).This SNV was absent As can be seen, this variation is found in only 9% of the reads of the hippocampus, while it is absent in blood.Thus, brain and blood samples were sequenced by Sanger's method.B) Brain (hippocampal) DNA sample containing the SNV present in COL3A1 in a low proportion was sequenced, but the SNV was not detected.C) Blood DNA samples from the same person as in (A), lacking the SNV present in COL3A1, were also sequenced.Curiously, cleaner sequences were obtained from blood than from brain samples.
in blood.The SNV was absent in the sequence 5'-CTGCAG-3', which was present in most of the DNA molecules and is the cleavage motif for restriction nuclease Pst1 (Table 1).Thus, we digested those molecules lacking the SNV with Pst1 and amplified the undigested DNA by PCR.However, since a very low amount of DNA was obtained in the PCR amplification, we were unable to obtain clear readouts after Sanger sequencing.In an attempt to achieve a better amplification, we used TruePrime™.We attained only a modest improvement of the data and found that the sequence with the specific SNV was contaminated by other sequences (Supplementary Figure 3).We thus conclude that the procedure described herein is suitable to validate low frequency allele variations present in at least 10% of DNA molecules by Sanger sequencing.Indeed, similar results to those of Supplementary Figure 3 were obtained when two other genes, wasf3 and hsdl2, were tested (data not shown).

DISCUSSION
Here we report a novel method (Fig. 7) to validate low frequency SNVs previously identified by Illumina sequencing in various tissues of a single donor.This method involves bioinformatics processing based on previously described methods [15].Moreover, after removing the DNA lacking the somatic variations by means of digestion with specific restriction nucleases that can differentiate the presence of a single nucleotide, the remaining SNVs can be validated by Sanger's method.Although suitable nucleases are not always available for every SNV found, the good correlation between bioinformatics data and the cases validated by Sanger's method after Fig. 5. Scheme showing the procedure developed to validate the presence of the SNV in hippocampus samples.As in the previous figure, the presence of the SNV modifies the cut site for the restriction enzyme Eco0109I, which recognizes the site AGGACCC present in the sequence.The variation modifies this sequence to ATGACCC, which is not recognized by the enzyme.Consequently, the sequence cannot be cut.This feature allowed us to recover the non-digested band from a gel and to amplify the sample by PCR to carry out posterior sequence analysis using Sanger's method.nuclease treatment support the suitability and robustness of the bioinformatics procedures used to identify low-frequency variants.
As a first test for the use of restriction nucleases, we addressed whether we could fractionate DNA containing or lacking a given SNV in a donor with this SNV in heterozygosis.The gene containing the SNV, OR9G4, has been previously analyzed and the SNV reported (1000 genomes Project, http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/), the allele frequencies for the Iberian population being A = 0.7009 and G = 0.2991.To look for specific a SNV present in less than 1/10 of the total reads, TruePrime™ technology (Picher et al., submitted) was used for rolling circle DNA amplification.In this technique, we used a specific oligomer primer whose motif contained the SNV.
Moreover, the presence of somatic mutations in single human neurons has recently been reported using single-cell sequencing [3,4], being the main cause for the appearance of SNVs, damage that could take place during transcription (see also the review [27]).
Also, it has been suggested that the presence of somatic SNVs induces the development of neurodevelopmental diseases [28].In addition, new point mutations may occur at CpG dinucleotides, at cytosines that could be methylated [29].Codons for arginine or glycine residues begin with a CG dinucleotide, which may explain why arginine or glycine is changed to other residues in a large proportion of non-synonymous somatic mutations [30].Curiously, in one of our cases, we observed a change of a glycine residue (Table 1).Also, we found that none of the SNVs in Table 1 (AD patients) were described as single nucleotide polymorphisms in genome-wide association studies central data.However, the SNV (non-demented controls) shown in Fig. 1 was already indicated in GWAS central data.The previous results are compatible with those SNVs shown in Table 1 arising by somatic mutations.
In relation to the genes analyzed in this study, Or9g4 belongs to the olfactory receptor protein family and is a member of the G-protein-coupled receptors.The function of this member has not been analyzed.Col3A1 encodes type III procollagen, and mutations cause type IV Ehlers-Danlos syndrome [31].LRBA participates in intracellular vesicle trafficking and its expression is induced by lipopolysaccharide (LPS).Alterations in this gene are associated with a syndrome of immune deficiency and autoimmunity [32].
In summary, here we describe a method to identify low frequency somatic SNVs in non-proliferating cells like neurons.These variations can later be validated by Sanger sequencing.This method includes: Fig. 7. Scheme showing the method developed in this work to detect somatic mutations that can be validated by Sanger's sequencing.After obtaining DNA samples from blood and hippocampus, their exomes were processed as indicated in materials and methods to obtain the reads in fastaq format in each case.A first process was made to obtain the sequences in bam format, using some of the tools included in GATK software [20].These tools were: Indel realigned, Mark Duplicates and table recalibration (see methods) and the files were obtained in recal.bamform.After this, we processed the recal.bamfiles coming from blood and hippocampus samples with Virmid software, comparing hippocampal and blood samples in each case to obtain somatic mutations in hippocampus.The additional processes include looking for somatic mutations that are present in restriction nuclease motifs for a specific endonuclease.The use of that enzyme to enrich the sample having the SNV (due to the fact that its DNA can not be cut by the enzyme) and the recovery of this uncut sample from a gel and its amplification by PCR (when the SNV is present in at least 10% of the cells) or by using True Prime™ when a lower percentage of cells (or a very low amount of DNA) showing the SNV are present, may result in having a proper amount of DNA.Finally, sequencing by Sanger's method can be done.a) the use of Virmid software; b) the removal of the DNA molecules lacking the somatic SNV by restriction nuclease digestion; and c) amplification of the DNA molecules containing the SNV of interest by means of TruePrime™ technology and oligonucleotides containing the motif in which the SNV is present.The limitations of the method are that there is not always a restriction nuclease available to digest DNA lacking the SNV of interest and that the minimal percentage of alternative reads that can be reliably validated by Sanger sequencing is 10% when brain tissue is used.Thus, the method proposed could be of help for validating low frequency tissue-specific mosaicism not only in the brain and in neurological diseases but in any tissue of interest.Furthermore, our data support the reliability of Virmid software to identify low frequency SNVs in Next-generation sequencing data.

Fig. 1 .
Fig.1.Scheme showing the gene region where the SNV, determined by Illumina, is present.A) A single nucleotide variation in or9f4r gene, (dbSNP ID[2] : rs513873 A−→G), in position chr11 : 56510623 (according to GRCh37 assembly of human genome) was detected.In order to test the efficiency of the Illumina-Virmid method, we sequenced three samples, with and without the variation, as described previously, in such way that A/A is homozygous for the reference allele and it does not have the SNV, G/G is homozygous for the described SNV, and A/G is heterozygous, having both the reference allele and the SNV respectively in each chromosome 11.The figure shows the alignments of the processed reads in the flanking regions of this SNV.As can be seen, each sequence shows a concrete number of reads with the reference allele and/or with the alternative allele (corresponding to the described SNV), thereby confirming its genotype (A/A, A/G, or G/G).B) This figure shows the chromatograms resulting from the Sanger sequencing analysis of the fragments containing (or not) the previously described SNV.The results obtained by Illumina sequencing described in Fig.1Aand the genotypes of the samples are corroborated here.The arrows indicate the exact site where the SNV is found in the sequence, pointing to a single peak for the homozygous samples (A/A and G/G) and a double peak for the heterozygous ones (A/G).

Fig. 4 .
Fig. 4. Figure showing the alignments of the reads for the exomes of two different tissues (hippocampus and blood) from the same individual (2H).A) The alignments were detected for a somatic mutation by Virmid software (see methods).This site is in chromosome 2 in position 189853334 in COL3A1 and results in the change G−→T in that position.As can be seen, this variation is found in only 9% of the reads of the hippocampus, while it is absent in blood.Thus, brain and blood samples were sequenced by Sanger's method.B) Brain (hippocampal) DNA sample containing the SNV present in COL3A1 in a low proportion was sequenced, but the SNV was not detected.C) Blood DNA samples from the same person as in (A), lacking the SNV present in COL3A1, were also sequenced.Curiously, cleaner sequences were obtained from blood than from brain samples.

Fig. 6 .
Fig. 6.Amplification of brain DNA containing the specific SNV by TruePrime™ and Sanger's sequencing of the same amplified sample.A) Scheme of circular DNA amplification by TruePrime™ using oligonucleotides containing the specific SNV.B) Effect of oligo concentration and method used for DNA amplification on obtaining the sequence bearing the SNV.C) Sequences, obtained by Sanger's, of samples 2 (negative), 4, 5, 6, 7, and 8.

Table 1
SNVs from AD patients validated by the present method

Table 2
Oligonucleotides used for DNA amplification by PCR.f means forward and r reverse