Whole genome grey and white matter DNA methylation profiles in dorsolateral prefrontal cortex

The brain's neocortex is anatomically organized into grey and white matter, which are mainly composed by neuronal and glial cells, respectively. The neocortex can be further divided in different Brodmann areas according to their cytoarchitectural organization, which are associated with distinct cortical functions. There is increasing evidence that brain development and function are governed by epigenetic processes, yet their contribution to the functional organization of the neocortex remains incompletely understood. Herein, we determined the DNA methylation patterns of grey and white matter of dorsolateral prefrontal cortex (Brodmann area 9), an important region for higher cognitive skills that is particularly affected in various neurological diseases. For avoiding interindividual differences, we analyzed white and grey matter from the same donor using whole genome bisulfite sequencing, and for validating their biological significance, we used Infinium HumanMethylation450 BeadChip and pyrosequencing in ten and twenty independent samples, respectively. The combination of these analysis indicated robust grey‐white matter differences in DNA methylation. What is more, cell type‐specific markers were enriched among the most differentially methylated genes. Interestingly, we also found an outstanding number of grey‐white matter differentially methylated genes that have previously been associated with Alzheimer's, Parkinson's, and Huntington's disease, as well as Multiple and Amyotrophic lateral sclerosis. The data presented here thus constitute an important resource for future studies not only to gain insight into brain regional as well as grey and white matter differences, but also to unmask epigenetic alterations that might underlie neurological and neurodegenerative diseases.


| I N T R O D U C T I O N
The neocortex is the largest structure in the human brain representing 82% of total brain mass and containing >80 billion cells. It is the brain region where cognition, language and complex behaviors take place and are being planned, and where neurons and glial cells form a patterned structure that defines the two main divisions of the brain: the grey and the white matter. The grey matter, the greyish appearance of which is the consequence of the presence of cell bodies, is mainly composed by neurons and glial cells (glia/neuron ratio 1.5) whereas the white matter, the whiteness of which is due to the myelin that covers axons, is mostly composed by glial cells (glia/neuron ratio 15.5) (Azevedo et al., 2009). Therefore, unraveling the epigenetic profiles governing grey and white matter will provide information about the epigenetic landscapes of neurons and glial cells. This quest has recently started to be addressed using genome-wide (Guintivano, Aryee, & Kaminsky, 2013;Iwamoto et al., 2011;Kessler et al., 2016;Kozlenkov et al., 2014;Labonte et al., 2013) and whole-genome approaches (Lister et al., 2013). In these studies, immunolabeled neuron and glial cells were isolated and their DNA methylation patterns examined allowing for the comparison of pure populations of cellular types, yet without considering their spatial organization. This is an important handicap because several studies have highlighted the importance of considering anatomical compartments. Indeed, neuron and glial cells show specific spatial profiles of gene expression according to their emplacement within anatomical structures (Bohland et al., 2010;Ko et al., 2013), which is also evident by characteristic DNA methylation profiles indicating an important spatial regulation as shown in other brain areas (Davies et al., 2012;Hernandez et al., 2011;Kozlenkov et al., 2016;Ladd-Acosta et al., 2007;Lee et al., 2011;Sanchez-Mut et al., 2013;Xin et al., 2010).

| Human samples
Post-mortem tissues were obtained from the Institute of Neuropathology Brain Bank (HUB-ICO-IDIBELL Biobank) following the practices and expertise of BrainNet Europe Bank (http://www.brainnet-europe. org/) "Network of Excellence" funded by the European Commission in the 6th Framework Program "Life Science" (LSHM-CT-2004-503039).
All samples were obtained in agreement with ethical issues and legislations defined by the European Union and after the approval of the local Ethic Committee. DNA was extracted from grey and white matter of dorsolateral prefrontal cortex (Brodmann area 9). The EZ DNA methylation kit (Zymo Research) was used for bisulphite concersion of all DNA samples used as previously described .
Previously reported WGBS gray and white matter data from the same control donor were used (GSE47966 GSM1173772-Grey matter- (Lister et al., 2013); GSM1279516 GSE52271-white matter- (Sanchez-Mut et al., 2016); female 64 years old). Five additional control grey and white matter were hybridized on 450K arrays (three females and two male aged 69.4 6 4.3 and four females and one male aged 64.4 6 9.7 respectively). A third set of ten control grey and 10 white matter samples was used for pyrosequencing validation (seven females and three male aged 70.3 6 3.8 and eight females and two male aged 69.1 6 5.1 respectively).

| Pyrosequencing
The set of primers for PCR amplification and sequencing (Supporting Information Table S1) were designed using the software PyroMark assay design version 2.0.01.15 (Qiagen); amplification primers hybridize with CpG-free sites to ensure methylation-independent reaction and one primer (opposite to the sequencing primer) is biotinylated to convert the PCR product to single-stranded DNA templates. We used 1 lL of bisulphite-treated DNA for each PCR. We used the Vacuum Prep Tool (Quiagen) to prepare single-stranded PCR products according to manufacturer's instructions. Pyrosequencing reactions and methylation quantification were performed in a PyroMark Q96 System version 2.5 (Qiagen).

| Whole-genome bisulfite sequencing data processing
Sequence alignment and methylation calling were performed with version 0.7.4 of Bismark software (Krueger & Andrews, 2011). We used hg19 as reference genome and retrieved genomic information from Biomart (Haider et al., 2009). SAM/BAM and BED files handling was done using SAMtools (Li et al., 2009), bedtools (Quinlan & Hall, 2010) and Tabix (Li, 2010). Statistical analysis and graphic representation was performed with R (http://www.R-project.org) and libraries multicore and ggplot2. We defined the promoter region as 2 kb flanking the transcription start site (TSS). TSS was considered to be the upstream-most base of all the transcripts of the gene. Smoothed methylation profiles were generated following the BSmooth pipeline (Hansen, Langmead, & Irizarry, 2012).

| CpG distance correlation
We assessed distance correlation of CpG in close proximity gathering information about methylation and relative distance up to 2000 bases away from all CpG sites and correlating pair-wise methylation at single CpG sites for each relative distance.

| Detection of differentially methylated regions
Differentially methylated regions (DMR) were identified as regions with >5 consecutive CpG sites that showed a consistent difference between the 95% CI of the smoothed methylation profile (Hansen et al., 2012).

| Detection of histone marks
We obtained human histone marks for human adult brain inferior temporal lobe available at the ENCODE project (GSE17312) and CCAT peak caller (Xu et al., 2010) with the standard parameter for histone marks.

| Microarray-based DNA methylation analysis with Infinium 450k array
The Infinium methylation assay was carried out as described previously . p-values for probes covering and following consistent tendencies with the identified DMRs were obtained using Kruskal-Wallis test and false discovery rate (FDR < 0.05) adjustment for multiple comparisons.

| RNA expression profiling
RNA expression profiling was performed following the Pico Profiling method (Gonzalez-Roca et al., 2010). Each sample analyzed by WGBS was also hybridized to a Human Gene ST 1.0 Arrays (Affymetrix). The arrays were scanned in a GeneChip Scanner 3000 (Affymetrix). CEL files were generated from DAT files using GCOS software (Affymetrix).
To generate the log 2 expression estimates, overall array intensity was normalized between arrays and the probe intensity of all probes in a probeset summarized to a single value using RMA (Robust Multichip Average) algorithm (Irizarry et al., 2003). 2.9 | Disease-associated genes AD-associated genes were obtained from AlzGene database (Bertram, McQueen, Mullin, Blacker, & Tanzi, 2007), PD-associated genes from PDGene Lill, Roehr et al., 2012), Amyotrophic lateral sclerosis-associated genes from ALSGene (Lill, Abel, Bertram, & Al-Chalabi, 2011), and Multiple Sclerosis-associated genes from MSGene Lill, Roehr et al., 2012). Relative gene amounts was assayed using two-tailed Fisher's exact test and the level of significance (alpha) set at p < .05.
Next, we asked how methylation values were distributed. As expected, we observed a similar bimodal distribution of CG methylation in both samples with 88% of Cs methylated (>66% average methylation value) and 5% of Cs unmethylated (<33% average methylation value). On the contrary, non-CG methylation showed a single distribution with most of Cs being unmethylated (90% of total Cs with average methylation value lower than 33%) even in grey matter where overall methylation levels were higher (Supporting Information Figure S1).
In keeping with previous results (Heyn et al., 2012), we observed a significant correlation between neighboring CG methylation site levels (Supporting Information Figure S2). On the contrary, no correlation was observed for non-CG methylation. This might be explained by the fact that CG methylation shows a bimodal distribution and tends to form clusters whereas non-CG methylation tends to be more scattered (Hebestreit, Dugas, & Klein, 2013;Lister et al., 2013).
Next, we assessed the relation between CG and non-CG methylation. To solve the problem of the scattered pattern of non-CG methylation, we defined flanking windows of 500 bp and 2 kb around the CpG islands and interrogated the non-CG methylation in the neighborhoods of CG methylated and unmethylated islands.
Interestingly, CG and non-CG methylation were correlated in grey and white matter samples with the former showing higher correlation values (Supporting Information Figure S3; R 5 0.51 p-value < 0.001 and R 5 0.21 p-value < 0.001; Pearson correlation values for 500 bp and 2 kb windows respectively). In keeping with previous results (Guo et al., 2014;Kulis et al., 2015;Sun et al., 2016;Ziller et al., 2011), we observed a correlation between CG and non-CG methylation which was stronger in grey matter where neurons, with higher levels Considering all of the aforementioned, we then decided to focus all subsequent analyses on CG methylation for the following reasons: (1) The functional relevance of CG methylation in gene regulation has been unequivocally demonstrated (Vinson & Chatterjee, 2012); (2) the correlation between neighboring CG methylation site levels allows the use of smoothing algorithms increasing the power, accuracy and sensitivity to detect differentially methylated regions (DMRs) (Hansen et al., 2012); (3) high-density array platforms are designed preferentially for interrogating CG methylation sites with the non-CG methylation probes being almost negligible; and (4) in contrast to the scattered pattern of non-CG methylation, differentially methylated CG loci can be precisely determined facilitating their validation using locus-specific techniques such as pyrosequencing.

| Differentially methylated regions
To assess CG methylation in more detail, we used previously published smoothing methods (Hansen et al., 2012) (Kozlenkov et al., 2014;Sun et al., 2016), supporting the accuracy of our results.

| Epigenetic crosstalk between DNA methylation and histone marks
After that, we interrogated the relation between DNA methylation and histone marks using the comprehensive epigenetic profiling of human brain samples within the Roadmap project (Bernstein et al., 2010) because there is an important crosstalk between both epigenetic marks (Cedar & Bergman, 2009), in particular also for higher cognitive functions (Miller & Sweatt, 2006). As expected, H3K4me3 and H3K9ac, which are enriched on active promoters, tend to be non-methylated (Supporting Information Figure S4). In contrast, H3K9me3, a marker of

| Correlation between DNA methylation and RNA expression
In addition, we examined the correlation between DNA methylation and RNA values using GeneChip human gene 1.0ST array data (Supporting Information Table S2). In keeping with the idea that promoter DNA methylation represses RNA expression (Kelly et al., 2012), pro-moter DNA methylation was negatively correlated with gene expression ( Figure 1C; Grey matter R 5 20.49; White matter R 5 20.47; Pearson correlations, p-value < 0.001), supporting the biological significance of the observed DMRs. However, in contrast with previous observations (Zemach, McDaniel, Silva, & Zilberman, 2010), no significant differences in RNA expression were observed in relation to gene body DNA methylation (data not shown).

| Validation of WGBS using 450K illumina platform and pyrosequencing
To validate WGBS results, we took advantage of the Infinium Human Methylation 450 BeadChip Kit which interrogates >485,000 methylation sites per sample at single-nucleotide resolution. DNA methylation profiles obtained from both platforms were highly correlated (R 2 5 0.957 and R 2 5 0.962 for grey and white matter respectively). In detail, from 11,860 DMRs identified by WGBS (Supporting Information Table   S3), 4,496 (38%) were also covered by 450K arrays and 3,820 (85%) of those were technically validated between both technologies   Figure S5). For further biological validation and for avoiding subject-specific differences, we used an additional five grey and five white matter samples. Importantly, 4,010 of covered DMRs (89,2%) showed consistent tendencies between independent samples (Supporting Information Figure S5) and 2,806 (70%) significant differences (Supporting Information Table S3; FDR < 0.05) reinforcing the biological importance of these findings. As expected, a clustering analysis of 450K CpGs covering DMRs showed a clear difference of DNA methylation between grey and white matter profiles in BA9 (Figure 2). Lastly, these data were also technically and biologically validated by pyrosequencing using a third set of ten grey and ten white matter samples. WGBS DMRs showing significant differences in the 450 K arrays were sorted according to their CpG mean differences and the top ranked genes were interrogated by pyrosequencing.
These results revealed similar DNA methylation differences between grey and white matter in WGBS, 450K data, pyrosequencing and RNA expression (Supporting Information Table S4, and Supporting Information Figure S6).

| Enrichment of DMR on glial and neuronal genes
Noticeably, we observed an important overlap of DMRs with promoters of neuronal (e. g., ENO2, TUBB3, MAP2, and MAPT) and glial genes (e. g., GFAP, GLUL, S100B, and NOTCH1 for astrocytes and MBP, SOX10, MAG, and MOG for oligodendrocytes). As expected, and in line with the negative correlation between promoter DNA methylation and  gene expression, genes preferentially expressed in neurons were hypomethylated in grey matter whereas genes preferentially expressed in glial cells markers were hypomethylated in white matter ( Figure 3A).
Consistently, an outstanding number of genes differentially expressed genes between neurons, astrocytes, and oligodendrocytes (Cahoy et al., 2008) also shows differences in DNA methylation (Supporting Information Table S5).
Next, we compared our data with previously published and NCBI GEO accessible data sets investigating DNA methylation differences between neuron and glial cells. Our DMR list comprises >90% of genes covered by both platforms and differentially methylated in the 450K data set GSE41826 consisting of 29 neuron and glia paired prefrontal cortex samples (average DNA methylation between NeuN 1 and NeuN 2 >5%; FDR < 0.05) (Guintivano et al., 2013). Besides, since human and mouse DNA methylation differences between neurons and glial cells are highly conserved (Kessler et al., 2016), we compared our DMR list with the WGBS data set GSM11737786-91 consisting of three paired cortex murine samples (Lister et al., 2013) Information Table S6).
Similarly, Protein ANalysis THrough Evolutionary Relationships (PAN-THER) (Thomas et al., 2003) terms such as synaptic vesicle trafficking, acetylcholine, oxytocine, GABA, and glutamate signaling genes were also enriched in DMRs between grey and white matter (Table 1). In this regard, these data are in line with the expected gene enrichments, presumably a direct consequence of the different composition of cells between grey (neuron-like) and white matter (glial-like). Interestingly, KEGG and PANTHER analysis also identified a significant enrichment of DMR in genes associated with AD, PD, and HD as well as Amyotrophic lateral sclerosis (ALS) ( Table 1 and Supporting Information Table   S6) indicating that an important fraction of differentially methylated genes between grey and white matter are associated with neurodegenerative diseases.

| DISCUSSION
In this study, we used a novel approach to determine the specific characteristics of grey and white matter divisions in BA9. Acutely dissected grey and white matter from BA9 was isolated and its DNA extracted.

WGBS and Human Infinium Human Methylation 450 BeadChip Kit
arrays were used to determine their DNA methylation profiles and the resulting data was further validated by pyrosequencing.
Despite that WGBS can provide data on CG and non-CG methylation, we decided to focus on CG methylation for several practical regions. First, the effect of CG methylation on gene regulation has largely been demonstrated (Vinson & Chatterjee, 2012). Second, CG methylation of neighboring CG sites is strongly correlated facilitating the application of smoothing algorithms that increase the power and significance of the results (Hansen et al., 2012). Third, high-density array platforms interrogate mostly CG methylation facilitating the cross-validation between platforms (Sandoval et al., 2011). And finally, in contrast to the scattered pattern of non-CG methylation, loci of differentially CG methylation can be precisely determined facilitating the validation of the data using alternative techniques such as pyrosequencing.
Doing so, we found an important enrichment of DMRs in functionally relevant genome regions, especially in enhancers, which are, in turn, strongly associated with cell-type specificity (Creyghton et al., 2010;Zhu et al., 2013), consistent with previous reports (Kozlenkov et al., 2014;Sun et al., 2016). Similarly, our data show a significant negative correlation between promoter DNA methylation and gene expression, consistent with the idea that promoter DNA methylation represses RNA expression (Kelly et al., 2012). Finally, our results were also in agreement with the expected differences in cell type markers in grey and white matter as well as between grey and white matterspecific molecular pathways, supporting the biological significance of the observed DMRs.
To our knowledge, our study is the first to provide a complete and detailed landscape of DNA methylation profiles between the anatomically distinct grey and white matter. Previous studies interrogating

DNA methylation profiles of neuronal and glial cells have used isolated
NeuN-positive cells and ChIP-on-chip assays (Iwamoto et al., 2011;Labonte et al., 2013). ChIP-on-chip assays consist of immunoprecipitating DNA using antibodies that recognize 5mC or MBD proteins, a technique that is known to introduce a certain bias of sequence recognition and to be limited in resolution imposed by the array. Therefore, this technique can only represent a partial view of bona fide DNA methylation profiles. Others have used restriction enzyme-based methods (Kozlenkov et al., 2016;Montano et al., 2013;Sun et al., 2016) or Infinium HumanMethylation450 BeadChip arrays (Guintivano et al., 2013;Kozlenkov et al., 2014;Kozlenkov et al., 2016) that provide a broader representation of the genome but are biased to CpG-rich regions. Circumventing this problem, Lister and collaborators have recently reported a WGBS study on isolated NeuN cells (Lister et al., 2013), which is not biased by immunoprecipiation and which analyzes virtually all nucleotides of the genome in a comparable resolution to our data.
Nonetheless, all these studies are based on NeuN-stained isolation of cells, which does not take the spatial information nor the anatomical specialization between different brain regions into consideration. Yet, neuronal and glial cells show characteristic spatial expression profiles according to specific anatomical structures (Bohland et al., 2010;Ko et al., 2013), which are also reflected at DNA methylation profiles (Davies et al., 2012;Hernandez et al., 2011;Ladd-Acosta et al., 2007;Lee et al., 2011;Sanchez-Mut et al., 2013;Xin et al., 2010).
Notwithstanding, our approach might be limited by imperfections in dissection and the impurity of the grey and white matter divisions and the limited number of samples analyzed. Furthermore, standard bisulfite conversion cannot recognize hydroxymethylation (Huang et al., 2010), an emerging epigenetic modification in the brain with various functions including cognition (Massart et al., 2014;Mellen, Ayata, Dewell, Kriaucionis, & Heintz, 2012), nor can we capture the highly dynamic nature of DNA methylation, including DNA methylation changes associated with learning and memory (Gräff & Mansuy, 2008;Lubin, Gupta, Parrish, Grissom, & Davis, 2011;Miller & Sweatt, 2006;Numata et al., 2012).
Interestingly, our analysis has revealed an outstanding enrichment of DMR in genes associated with several neurological and neurodegenerative diseases. In particular, we have observed an enrichment of DMRs in genes previously associated with AD, PD, HD, MS and ALS.
AD-and PD-related genes such as CLU and MAPT, and LRRK2 and PINK1 are differentially methylated between grey and white matter, as well as IL1R1 and TNFRSF1A, which are associated with MS and ALS, respectively. In line with the repressive effect of promoter DNA methylation, these genes tend to be hypermethylated ( Figure 3B) and less expressed (Supporting Information Table S2) in white matter, where a higher proportion of glial cells are observed. In follow-up studies, it will be interesting to determine whether these genes are differentially methylated in brain regions particularly affected in neurodegenerative diseases. First attempts along this line, exemplified by MAPT DNA methylation, have shown a complex scenario in which the specific brain region DNA methylation patterns can be influenced by the genetic background and the disease status (Coupland et al., 2014).
These observations indicate that genes associated with neurodegenerative diseases are differentially regulated between grey and white matter, which are in turn differentially affected in such diseases, reinforcing the necessity of considering the cellular and spatial compartmentalization in neurodegenerative studies. In addition, these data also suggest that such genes play a major role in grey matter and, by extension, in neurons, while other genes showing the opposite relation might need to be interpreted as glia-relevant genes. Also, since neurodegenerative diseases are primarily affecting particular cell types and specific brain regions, an alteration of the cellular ratios and spatial distribution of their specific epigenetic patterns are expected. As a consequence, a direct comparison between control and disease-affected tissues should not only result in alterations underlying neurodegeneration diseases but also in differences between cell patterns that can be present even in studies based on neuron/glia fluorescence-activated cell sorting (FACS), where a minimal contamination of cells cannot be completely discarded. Therefore, previous studies investigating neurodegeneration-related DNA methylation changes in genes showing differences between grey and white matter, or neuronal and glial cells, ought to be carefully interpreted in light of our observations presented here (for a detailed revision of genes previously associated with AD see (Sanchez-Mut & Gräff, 2015), PD see (Feng, Jankovic, & Wu, 2015), MS see (Li, Xiao, & Chen, 2016), and ALS see (Paez-Colasante, Figueroa-Romero, Sakowski, Goutman, & Feldman, 2015)).
Taken together, the results presented here offer a complete and detailed landscape of DNA methylation profiles of grey and white matter in BA9 that constitutes an important resource to assist future studies determining cell type-specific (epigenetic) differences pertinent (or not) to this region of the cortex. Additionally, our data may also be relevant to unmask and better understand epigenetic alterations underlying neurological and neurodegenerative diseases, and to assist case-control studies where cell type compositions are not considered.