Solution equilibria of the cytosine- and guanine-rich sequences near the promoter region of the n-myc gene which contain stable hairpins within lateral loops

Solution equilibria of the cytosineand guanine-rich sequences near the promoter region of the n-myc gene which contain stable hairpins within lateral loops Sanae Benabou, Rubén Ferreira, Anna Aviñó, Carlos González, Ramon Eritja, Joaquim Jaumot, Raimundo Gargallo 1. Solution Equilibria and Chemometrics group (Associate Unit UB-CSIC), Department of Analytical Chemistry, University of Barcelona, Diagonal 645, E-08028 Barcelona, Spain 2. Institute for Advanced Chemistry of Catalonia (IQAC-CSIC), CIBER-BBN Networking Centre on Bioengineering, Biomaterials and Nanomedicine, Jordi Girona 18-26, E-08034 Barcelona, Spain 3. Institute of Physical Chemistry “Rocasolano”, CSIC, Serrano 119, E-28006 Madrid, Spain


Introduction
Cytosine-rich regions of DNA are capable of forming characteristic structures known as i-motif. The core of this structure is formed by the bonding of two cytosine bases by three hydrogen bonds. The key characteristic is that one of the cytosines involved in the base pair must be protonated at N3. Because of this requirement, i-motif structures can only be formed at pH lower than 7, being its maximal stability found at pH around 4.5, i.e., near the pK a of free cytosine. The formation in vitro of i-motif structures has been proposed in sequences corresponding to the cytosinerich strand of telomeric DNA [1; 2], the human centromeric satellite III [3], and of genes like VEGF [4], bcl-2 [5], k-ras [6], RET [7], c-myc [8], c-jun [9], or Rb [10]. Despite of the requirement of acidic medium for a high stability, the potential role of i-motif structures in the control of gene expression is being investigated [8]. On the other hand, the structural properties of the i-motif have attracted the interest for potential applications in nanotechnology [11; 12].
Cytosine-rich regions are accompanied by the corresponding complementary guanine-rich regions which may form a special structure known as G-quadruplex. The core of this structure is formed by two or more tetrads, which is an ensemble of four guanine bases bonded by hydrogen bonds in nearly the same spatial plane. It has been shown the formation in vitro of such structures in DNA sequences corresponding to the end of telomeres [13] and to the promoter regions of several oncogenes [14; 15]. Recently, the existence of G-quadruplex in vivo has been shown [16]. Effort is being done in the development of drugs which could selectively bind to G-quadruplex structures, modulating in this way the expression of certain genes [17].
In vivo, the coexistence of both guanine-rich and cytosine-rich strands suggests the formation of the Watson-Crick duplex as the major species, being the proportion of the intramolecular structures (i-motif and G-quadruplex) residual, if any. Nevertheless, effort is being made to quantify the formation of the duplex structure in mixtures of the guanine-and cytosine-rich regions in order to know the potential role of the minority structures in gene expression [18; 19].
In this context, we have focused our attention on two cytosine-and guanine-rich sequences located near the promoter region of the n-myc gene. This gene is a member of the myc family of transcription factors and encodes a protein with a basic helix-loop-helix domain. Amplification of this gene is associated with a variety of tumors, most notably neuroblastoma [20]. The cytosine-rich sequence studied (nmyc01, Table 1) here is located at -349 to -315 bases upstream of the first position of the coding region (CDS), being the guanine-rich sequence studied complementary to nmyc01 (nmyc02, Table 1). These sequences are unique because they show an unusual12-bases long loop containing two complementary TGAC sequences which, in principle, could hinder the formation of stable intramolecular structures [21]. To our knowledge, no previous work has been done on these sequences. Recently, it was proposed that another guanine-rich sequence, located in the first intron of the n-myc sequence, forms both monomeric and dimeric G-quadruplex structures [22]. A G-quadruplex-duplex structure has also been hypothesized in a thrombin-binding aptamer, although in this case the sequence is an artificial sequence obtained from SELEX [23].
Circular Dichroism (CD), molecular fluorescence based on molecular beacons, Nuclear Magnetic Resonance (NMR), polyacrylamide gel electrophoresis (PAGE) and molecular absorption spectroscopy were used to monitor the experiments carried out. Multivariate data analysis methods were used to recover qualitative and quantitative information about the species and conformations present in all experiments. Finally, Electrospray Ionization-Mass Spectrometry (ESI-MS) and Size-Exclusion Chromatography (SEC) were used to complement the results obtained from spectroscopy and PAGE [24]..

Reagents
The DNA sequences (Table 1) were synthesized on an Applied Biosystems 3400 DNA synthesizer using the 200 nmols scale synthesis cycle. Standard phosphoramidites were used. Ammonia deprotection was performed overnight at 55 ºC. The resulting products were desalted by Sephadex G-25 (NAP-10, Amersham Biosciences) and used without further purification. The length and homogeneity of the oligonucleotides was checked by denaturing polyacrylamide gel electrophoresis (PAGE) and reversed-phase HPLC using X-Terra® columns. Regarding the sequence Fnmyc02Q the quencher Q was introduced at the 3'-end using Dabsyl-CPG solid support (Glen Research) and the fluorophore F at the 5'-end using the fluorescein phosphoramidite Fluoroprime (Amersham Biosciences). DNA strand concentration was determined by absorbance measurements (260 nm) at 90 o C using the extinction coefficients calculated using the nearest-neighbor method as implemented in OligoCalc webpage [21]. Before any experiment, DNA solutions were first heated to 90 o C for 10 minutes and then allowed to reach room temperature. KCl, KH 2 PO 4 , K 2 HPO 4 , NaCH 3 COO, HCl and NaOH (a.r.) were purchased from Panreac (Spain). MilliQ ® water was used in all experiments.

Procedures
Absorbance spectra were recorded on an Agilent 8453 diode array spectrophotometer. The temperature was controlled by means of an 89090A Agilent Peltier device. Hellma quartz cells (1 or 10 mm path length, and 350, 1500 or 3000 µl volume) were used. CD spectra were recorded on a Jasco J-810 spectropolarimeter equipped with a Julabo F-25/HD temperature control unit. Hellma quartz cells (10 mm path length, 3000 µl volume) were used. pH measurements were determined with an Orion SA 720 pH/ISE meter and micro-combination pH electrode (Thermo).
Fluorescence emission spectra were recorded on a Aminco Bowman AB-2 fluorimeter, equipped with a cell holder thermostated by a JP Selecta Frigiterm bath. Emission spectra were recorded between 500 and 620 nm and fluorescence intensities were recorded every 1 nm. The excitation wavelength was set to 492 nm, the photomultiplier voltage to 600 V and the excitation and emission band pass to 4 nm. A 10 mm pathlength and 400 µL volume quartz cell was used.
Acid-base titrations were monitored either in-line (taking advantage of the stirrer incorporated in the Agilent cell holder) or at-line (in the case of the CD instrument). Experimental conditions were as follows: 25 o C and 150 mM KCl.
Titrations were carried out by adjusting the pH of solutions containing the oligonucleotides. CD and/or absorbance spectra were recorded in a pH stepwise fashion.
Melting experiments were monitored with the Agilent-8453 spectrophotometer equipped with the Peltier unit. The DNA solution was transferred to a covered 10-mm-path-length cell and absorption spectra were recorded at 1 o C intervals with a hold time of 3 min at each temperature value, which yields an average heating rate of approximately 0.3 o C min -1 . Buffer solutions were 20 mM phosphate or acetate, and 150 mM KCl. Each sample was allowed to equilibrate at the initial temperature for 30 minutes before the melting experiment was begun.
NMR spectra were acquired in a Bruker Advance spectrometer operating at 600 MHz and equipped with a crioprobe.
Water suppression was achieved by the inclusion of a WATERGATE [22] module in the pulse sequence prior to acquisition.
The chromatographic system consisted of an Agilent 1100 Series HPLC instrument equipped with a G1311A quaternary pump, a G1379A degasser, a G1392A autosampler, a G1315B photodiode-array detector furnished with a 13-μL flow cell, and an Agilent Chemstation for data acquisition and analysis (Rev. A 10.02), all from Agilent Technologies (Waldbronn, Germany). A BioSep-SEC-S 2000 column (300 × 7.8 mm, particle size 5 μm and pore size 145 Å) from Phenomenex (Torrance, CA, USA) was used for the chromatographic separation. The mobile phase was 75 mMpotassium phosphate adjusted to pH 7.1. The flow rate was set to 1.0 mL min−1. A volume of 15 μL of the sample was injected and the temperature was set to 25 °C [27]. Absorbance spectra were recorded between 200 and 500 nm.
Polyacrylamide gel electrophoresis was performed at room temperature on 12% non-denaturing gels (19:1 acrylamide:bisacrylamide, Sigma) of 10 × 10.5 cm with a miniVE apparatus (Hoeffer) at 10 V/cm for 2 h. Gels and buffers contained 40 mM Tris acetate pH 5.2 or 40 mM Tris acetate pH 8.0. 100 pg of each oligonucleotide was loaded per lane after addition of 10% (w/v) of a loading dye containing 30% Glycerol and 0.1% Bromophenol Blue.
After migration, the gels were stained with SYBR Gold (Molecular Probes) according to the manufacturer's instructions and digitalized with a Typhoon 8600 system (Molecular Dynamics).

Data analysis
Spectra recorded during acid-base or melting experiments were arranged in a table or datamatrix D, with m rows (spectra recorded) and n columns (wavelengths measured). The goal of data analysiswas the calculation of distribution diagrams and pure (individual) spectra for all nc spectroscopically-active species considered throughout an experiment. The distribution diagram provides information about the stoichiometry and stability of the species considered (in the case of acid-base and mole-ratio experiments), as well as the thermodynamics of the melting processes. In addition, the shape and intensity of the pure spectra may provide qualitative information about the structure of the species. With this goal in mind, data matrix D was decomposed according to Beer-Lambert-Bouer's law in matrix form: where C is the matrix (m x nc) containing the distribution diagram, S T is the matrix (nc x n) containing the pure spectra, and E is the matrix of data (m x n) not explained by the proposed decomposition.
The mathematical decomposition of D into matrices C, S T , and E may be done basically in two different ways, depending on whether a physico-chemical model is initially proposed (hard-modeling approach) or not (softmodeling approach) [23].
For melting experiments, the concentration of the folded and unfolded forms is temperature-dependent.
Accordingly, the equilibrium constant depends on temperature according to the van't Hoff equation: In this case it is assumed that ∆H vH and ∆S vH will not change throughout the range of temperatures studied here.
Whenever a physico-chemical model is applied, the distribution diagram in C complies with the proposed model.
Accordingly, the proposed values for the equilibrium constants and the shape of the pure spectra in S T are refined to explain satisfactorily data in D, whereas residuals in E are minimized.
In this study, hard-modeling analysis of acid-base and mole-ratio experiments used the EQUISPEC program [24].
Hard modeling of melting experiments was done with a modified version of the Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) procedure that includes the model proposed in equation (4) for the unfolding of intramolecular structures [25].

Results
The solution equilibria of the cytosine-rich region were studied first. Those of the guanine-rich region were then considered. Finally, the potential formation of the Watson-Crick duplex structure from the isolated structures was examined.

Solution equilibria of the cytosine-rich region
To study the influence of pH on the conformational equilibria of nmyc01, acid-base titrations and melting experiments were carried out. An acid-base titration of an nmyc01 sample in the pH range 2.0-7.3was simultaneously monitored using CD and molecular absorption spectroscopy. A selected set of the measured CD and absorption spectra is shown in Fig. 1a-b and the whole set of experimental spectra is presented in Fig. S1 (Supplementary material). At pH values around 6, the strongly positive band around 286 nm is indicative of the formation of an i-motif structure [31]. To obtain quantitative information the whole set of experimental CD and molecular absorption spectra was analyzed using a multivariate data analysis method, which enabled calculation of the distribution diagram and pure spectra for each of the acid-base species considered. The results obtained depend strongly on the number of acid-base species considered to be present during the titration. In this case, the analysis was performed on the assumption of the presence of three or four acid-base species. After a careful study of fitted curves and of the resulting residuals, the presence of four acid-base species was proposed, i.e., three transitions throughout the pH range considered. The obtained fits for the acid-base models considering three or four species are shown in the Supplementary material (Fig. S1).
The calculated distribution diagram and pure spectra are shown in Fig. 1c-e. The information contained in both plots helps us to explain each of the four proposed acid-base species. The first species, present at a pH higher than 6, would correspond to the neutral form where all bases are deprotonated (i.e., in their neutral form). Its spatial structure is probably a partially stacked single strand. The two major species at pH 5.8 and 3.0 would correspond to two i-motif structures (named i-motif 1 and i-motif 2, respectively) stabilized by cytosine+ · cytosine base pairs. The calculated pure CD spectra for these two species are characteristic of this structure, showing a strong positive band around 285 nm and a weaker negative band around 264 nm. Finally, the major species at pH values lower than 2.0 would correspond to species in which all (or almost all) cytosines and adenine bases are protonated.
As expected, the formation of the i-motif 1 structure from the neutral form takes place within a narrow pH range (Fig. 1c). In this case, the value for parameter p (Eq. (2)) is 3, in concordance with the observed cooperativity.
Similarly, the disruption of the i-motif 2 to yield the fully protonated DNA occurs within a very narrow pH range, with a p value equal to 4, and only in a strongly acidic medium. This can be explained by the resistance of the bases involved in the i-motif core to their protonation, which destroys their structure. On the contrary, the transition between both i-motifs is smooth (p value equal to 1) with an apparent pK a around 4.1. Overall, the existence of two i-motifs can be attributed to the protonation of cytosine (pK a around 4.3) and/or adenine (pKa around 3.5 [32]) bases present in the loops. The pure absorption spectrum for i-motif 2 is slightly shifted to longer wavelengths (Fig. 1e), as a result of protonation. Concomitantly, the pure CD spectra of both i-motifs are similar because the core structure is well maintained, despite protonation (Fig. 1d). As these bases are not involved in the formation of the C+ · C core and probably do not form any base pairs, their protonation does not have any cooperative effect.
The nature of the species proposed from the spectroscopically monitored acid-base titration was also studied using 1 H NMR (Fig. 2, left). At pH 7 and 5 °C, the presence of signals between 12.5 and 14 ppm indicates the formation of Watson-Crick base pairs in nmyc01 [33]. The number of imino signals observed is consistent with the formation of two A · T and twoG · C base pairs, thus association in antiparallel orientation of the TGCA repeats of the long loop. The characteristic signals of protonated cytosines (~15 ppm) are not observed at this pH. These facts suggest that the structure of the previously proposed neutral strand at pH 7 is stabilized by Watson-Crick base pairs.
At pH 5 and 5 °C, the signals related to the Watson-Crick base pairs are weaker and broader than at pH 7 whereas a clear signal of imino hydrogen at 15.5 ppm denotes the existence of protonated cytosines. This is in accordance with the co-existence of Watson-Crick and C+ · C base pairs. This latter pairing is also present at pH 4. The large line-width of this imino signal (at 15.5 ppm) at pH 4 is most probably due to the presence of several similar conformers in equilibrium. This is not unexpected since one of the cytosine tracts is shorter than the others.
To study the potential presence of this base pairing between the TGCA repeats in the long loop, NMR spectra of a modified sequence (nmyc01m) were recorded (Fig. 2, right). In this case, the second TGCA sequence was replaced by four T, which impedes the formation of the suspected stem-loop structure. At pH 7 and pH 5, the presence of such base pairs was seen dramatically reduced in comparison to the wild sequence nmyc01, which confirmed the absence of a stem. Interestingly, at pH 7 and 5 °C, the presence of C + · C base pairs was detected in the spectrum of nmyc01m, suggesting a stabilization of the i-motif in the mutated sequence. From these results, however, it is not clear whether the Watson-Crick and C + · C base pairs in nmyc01 are present in a unique structure (an i-motif including a stem-loop stabilized by Watson-Crick base pairing in the long loop) or in a mixture of several species (an i-motif and the neutral strand stabilized by Watson-Crick base pairing). Melting experiments, as well as separation of the DNA species by polyacrylamide gel electrophoresis, have been carried out to determine the most plausible situation.
The i-motif structure may be formed by the unfolding of a unique DNA strand or by the association of individual strands. Given the high number of cytosine bases present in the nmyc01 sequence, both intramolecular folding and intermolecular arrangements can be envisaged. Melting experiments were first conducted to assess the existence of an intramolecular transition and to determine the influence of pH on the stability of the i-motif structure. A mathematical procedure was used to analyze the whole set of experimental spectra to obtain information about the number of conformations involved, as well as thermodynamic information related to each individual transition. This procedure is based on the proposal of thermodynamic equations to which spectroscopic data are fitted [30]. As example, the analysis of spectra recorded during a melting experiment at pH 6.1 is included as Supplementary material (Fig. S2). At pH 6.1, the melting temperature (T m ) was invariant throughout the concentration range 0.5-92 μM, which suggested intramolecular folding in the experimental conditions used in this study.
These results suggest that the proposed distribution diagram (Fig. 1c) may also be valid for the higher DNA concentrations used in NMR measurements (0.17 and 0.58 mM). Therefore, at pH 5, the Watson-Crick and C + · C base pairs observed are present in a single species (an i-motif including a hairpin stabilized by Watson-Crick base pairing in the long loop), rather than in a mixture of i-motif and the neutral strand stabilized by Watson-Crick base pairing. Table 2 summarizes the thermodynamic parameters calculated over the pH range 3.7-6.4. As expected for a structure in which the core is formed by protonated bases, the melting of the i-motif structure formed by the nmyc01 sequence was strongly pH-dependent (Fig. S3). In the pH range 6.4 to 4.5, T m values were almost a linear function of pH. Previous studies also found these relationships and showed that T m was highest at a pH around the pK a of free cytosine, depending on experimental conditions such as ionic strength [31]. The values for the change in enthalpy were maximal around the pK a of free cytosine. Assuming that the disruption of a C + · C base pair needs around 11 ± 1 kcal/mol [31], it can be deduced that the number of C + · C base pairs disrupted throughout the pH range 5.8-6.4 is around 5. An increment of the ΔH 0 value was observed at lower pH values, with a maximal value around the pK a of cytosine.
In order to quantify the effect of the Watson-Crick stem on the stability of the i-motif, the stability of a mutated sequence (nmyc01m) was studied. The Tm values at pH 5.0 and 6.1were 61 and 33 °C, respectively, similar to those obtained for the wild-type sequence (see Table 2). The changes in enthalpy and entropy, however, were clearly lower than in the first case. As a result, the Gibbs free energy was slightly lower than in the case of the wildtype sequence, especially at pH 5.0.
To gain information about the molecularity and structure of nmyc01 and nmyc01m folding, separation of the DNA species incubated at pH 5. It is tempting to speculate from these results that the stemloop dictates a preferred configuration into an intramolecular folding whereas its absence unlocks some possibilities to form dimer, trimers and tetramers associated by protonated cytosines.
The influence of salt content on stability was also studied. At pH 6.1, the i-motif structure formed even in the absence of added salt (KCl), as denoted by the characteristic CD signals of the i-motif structure (Fig. S4). No changes were observed upon later addition of salt. On the contrary, the melting temperature at pH 6.1 decreased from 52 °C (without added salt) to 33 °C (at 150 mM KCl). This behavior can be attributed to the shift in the pK a of cytosine to higher values in a low-salt buffer [31].

Solution equilibria of the guanine-rich region
Separation by PAGE of nmyc02 was realized in the same conditions than for nmyc01 (Fig. 3b) 8). Such a result strongly supports the idea that the retarded band for nmyc02 could be in this case a stabilized and compact structure folded into an intra-molecular G-quadruplex associated with a stem-loop involving Watson-Crick base-pairs. As with the i-motif, formation of this stem-loop may direct the formation pathway towards the intramolecular species rather than multimers.
Further, the acid-base equilibria and the thermal stability of the nmyc02 sequence in the pH range 7.1-2.5 were characterized. First, acid-base titration of an nmyc02 sample was carried out and CD and molecular absorption spectra were recorded. Selected spectra and the whole data set are shown in Figs. 4a-b and S5, respectively. At neutral pH, the CD spectrum of nmyc02 is characterized by a positive band at 265 nm and a negative band at 243 nm, the intensity of which is approximately half that of the first band. These features are characteristic of a parallel G-quadruplex structure. The absence of a clear shoulder at 295 nm rules out the presence of mixed antiparallel/ parallel structures [34].
Few changes in the CD spectra were observed upon protonation (Fig. 4a), suggesting a stable G-quadruplex structure over the pH range studied. However, a decrease in absorbance at 260 nm and a concomitant increase in absorbance at 280-295 nm were observed (Fig. 4b). The CD and absorbance data recorded over the acid-base titration of nmyc02 were analyzed using Equispec. The whole set of spectra fitted well when a model involving two acid-base equilibria, i.e., three spectroscopically active species was considered. The pure spectra and the distribution diagram were thus calculated for this number of species (Fig. 4c-e). The pH transition mid-point values were 5.0 ± 0.1 and 3.1 ± 0.2, similar to the pKa values of free cytosine and adenine, respectively. Accordingly, the acid-base species predominant at pH 7 was related to the G-quadruplex, in which cytosine and adenine bases remained deprotonated. The major species around pH 4.1 was related to the G-quadruplex, in which most of the cytosine bases were protonated whereas most of the adenine bases remained deprotonated. Finally, the structure predominant at pH 2 was related to the G-quadruplex structure, in which all, cytosine and adenine bases, were protonated. Both transitions lack any cooperativity (p equal to 1 in Eq. (2)), a fact that again could be related to the protonation of bases in the loops (i.e., not involved in any base pairing).
As in the case of nmyc01, the nature of the species proposed from the spectroscopically-monitored acidbase titration was also studied using 1 H NMR (Supplementary material, Fig. S6, left). At pH 7 and 5 °C, the signals at  Fig. S6, right). Indeed, this oligonucleotide did not present at pH 5 or 7 the signal at 13.6 ppm observed with the wild-type sequence, thus showing its inability to stably base-pair the TGCA repeats. The preserved signal around 11 ppm indicated however the conservation of G-quartets, as seen with PAGE.
Further, several melting experiments of nmyc02 samples were performed within the pH range 3.9-7.1. Data analysis was similar to that previously described for nmyc01. Fig. S7 shows the experimental spectra recorded during a melting at pH 6.1. The trace at 295 nm (inset) was characterized by the low hypochromicity characteristic of Gquadruplex unfolding. As for the unfolding of the i-motif structure, the best fits were achieved when two transitions, i.e. three components or conformations, were considered. The first transition was explained initially in terms of a partial unfolding of the nmyc02 G-quadruplex initially present at 20 °C and pH 6.1, as the existence of the Gquadruplex structure at 70 °C was confirmed by means of CD spectroscopy (data not shown). However, the fact that the magnitude of the first transition, as well as its midpoint, depends on the concentration of DNA (Supplementary material, Fig. S7) points out to the presence of DNA aggregates. The second transition corresponded to the complete unfolding of the G-quadruplex to yield the unordered nmyc02 strand. Table 3 [35]. At pH values near 7, the change in enthalpy is around 73 kcal/mol. The change in enthalpy per quartet would be 24 or 18 kcal/mol for a core structure of three or four tetrads, respectively. Table 3 shows that the change in enthalpy corresponding to the second transition increased at pH values lower than The number of tetrads present in the G-quadruplex structure adopted by nmyc02 was also calculated from the measured ESI-MS spectra (Fig. S8). As well as the peaks associated with DNA ions with a definite m/z ratio, several minor peaks related to DNA · NH 4 + adducts were detected. Hence, the peaks at m/z ratios 1807.4 and 1548.8 may be explained when the number of ammonium ions bound equals two (i.e., three tetrads of guanine bases) and the charge z is−6 and−7, respectively [36].
Finally, Size-Exclusion Chromatography was used to complement the results obtained from PAGE and the spectroscopic data, the main goal being the confirmation of super-structures of oligonucleotide multimers associated by G-quartets, a common behavior of G-rich oligonucleotides containing multiple stretches of guanines [27]. Fig. 5 shows the chromatograms recorded for a series of samples with increasing concentration. At 1 μM, the chromatogram showed a main peak around 7.4 min and a shoulder around 6 min. The chromatographic system was calibrated (log MW vs. retention time) with a set of DNA sequences that form linear structures. According to these, the completely linear nmyc02 should elute at 7.1 min. The shift from 7.1 to 7.4 min can be explained in terms of the formation of a G-quadruplex structure, which has a smaller hydrodynamic volume than the linear structure. When the concentration increased, the Gaussian peak at 7.4 min increased accordingly. In addition, the shoulder at 6 min increased concomitantly. This was attributed to aggregates that elute earlier than the G-quadruplex due to their larger hydrodynamic volumes. When a sample was incubated for a month at 4 °C, the ratio of aggregate increased dramatically (Fig. 5b).

Competitive equilibria Watson-Crick duplex versus intramolecular structures
To plot quantitatively the potential competition between the Watson-Crick duplex and the quadruplex structures formed by nmyc01 and nmyc02, kinetics, acid-base and melting experiments, as well as SEC and ESI-MS measurements involving mixtures of both sequences were carried out.
The kinetics of the formation of the duplex was checked by means of molecular beacons technology using a 5′-fluorescein and 3′-dabsyllabeled nmyc02 sequence. The maintenance of the G-quadruplex structure in this labeled sequence was checked by means of melting measurements. The determined Tm was 79 ± 1 °C, similar to the value for the unlabeled sequence, which confirmed that the structure was not affected by the addition of fluorescein and dabsyl. Upon addition of the stoichiometric amount of the complementary nmyc01 sequence an increase in the fluorescence intensity was observed at 37 °C (Fig. S9). This was related to the progressive unfolding of the Gquadruplex to yield the Watson-Crick duplex, and subsequent distancing of the fluorophore/quencher pair which enhanced fluorescence. The data fitted a biexponential function with rate constants equal to 0.005 ± 0.002 s −1 and 0.0004 ± 0.0001 s−1. Thismodel suggests the existence of two parallel, rather than two consecutive, reactions. The proposed mechanism and the calculated values for the rate constants are in the same order of magnitude as those calculated for the unfolding of a 22-nt human telomere quadruplex in 25 mM KCl, 10 mM phosphate, pH 7.2 and 20 °C [21]. Finally, the formation of the Watson-Crick duplex was completed after incubation at 37 °C overnight.
In order to study the formation of the duplex structure throughout a wide pH range, spectroscopically monitored acid-base titrations of nmyc01: nmyc02mixtures were carried out. As an example, the results obtained after analysis of a 1:1 nmyc01:nmyc02 mixture are shown here (Figs. 6 and S10). Analysis of the experimental data revealed the presence of four spectroscopically active species in the pH range 2.2-7.1. The explanation for the proposed acid-base species was not straightforward because some of these actually corresponded to mixtures of two or more of the five different nmyc01 and nmyc02 acid-base species previously described. However, the assignation of the major species at pH 7 to the Watson-Crick duplex was not in doubt. The major species at pH 5 was identified as a mixture of G-quadruplex and i-motif structures. The species that appeared at pH values lower than 3 was assigned to a mixture of G-quadruplex and unstructured nmyc01. Finally, the major species at pH 4 was identified as a mixture of G-quadruplex and protonated i-motif.
Spectroscopically-monitored melting of a previously incubated 1:1 mixture of nmyc01 and nmyc02 at pH 7. °C, a minor contribution of the duplex cannot be ruled out, which is in accordance with the proposed distribution diagram (Fig. 6). Fig. 8a shows a set of chromatograms recorded for nmyc01, nmyc02 and several mixtures at pH 7.0 and 25 °C. The sequence nmyc01 eluted at 7.2 min. Upon addition of increasing amounts of nmyc02, the intensity of this peak at 7.2 decreased whereas a new peak appeared around 6.7 min. This peak was attributed to the Watson-Crick duplex. This retention time fits perfectly into the calibration plot for linear sequences mentioned above. It should be noted that the peak at 5.6 min related to the presence of multimeric structures formed by nmyc02 was not reduced upon formation of the Watson-Crick duplex and the annealing procedure. Indeed, it is possible that particularly heat-resistant G-quadruplex structures were not denaturated properly, as suggested by their presence detected even a prolonged incubation at 99 °C in the presence of KCl (see lanes 1 and 5 in Fig. 3b). At pH 6.1, however, the addition of nmyc02 to nmyc01 did not produce a high yield of Watson-Crick duplex. Two months later, the peak associated with the duplex (at 6.7 min) had clearly increased whereas that associated with the multimers (at 5.8 min) had decreased.

Discussion
The study of non-canonical DNA structures is of great interest because of their potential role in some diseases and aging. Concomitantly, the number of G-quadruplex-forming regions observed in the eumetazoa for which complete genomic sequences are available has increased rapidly [37]. Recently, after decades of research in vitro, the in vivo presence of G-quadruplexes has been proven [16]. On the other hand, the requirement of low pH values for the formation of stable i-motif structures seems to be an obstacle for their formation in vivo. However, the existence of proteins that specifically bind to cytosinerich sequences has already been demonstrated [38]. In addition, it has been proposed that the i-motifs could form in the presence of crowding agents [39], proteins [4,40] or even at slightly basic conditions at low temperature and absence of added salt [41]. Under negative supercoiling, the i-motif forms under physiological conditions, and in this case it is more likely that stabilizing capping interactions may drive the formation of a favored i-motif [8].
In this work, the solution equilibria of two particular cytosine-(nmyc01) and guanine-rich (nmyc02) regions found in the promoter region of the n-myc gene were studied. Both sequences, which have not been studied before, contain a duplicate of the TGCA sequence separated by two nucleotides, thus capable of forming a hairpin stabilized by Watson-Crick base pairs. The cytosine-rich sequence forms two intramolecular i-motifs that are stable throughout the pH range 2-7, with maximal stability at pH 4.5. Under physiological conditions of pH and temperature, the relative concentration of the i-motif structure is small. The difference between the two i-motifs depends on the protonation of additional bases in the loops. Our results also show that the guanine-rich region forms an intramolecular parallel G-quadruplex that is stable throughout the studied pH range (2-7). Finally, the competition between the intramolecular structures (G-quadruplex and i-motif) and the intermolecular Watson-Crick duplex formed was studied, revealing that the Watson-Crick duplex is the predominant form at pH values above 6. Fig. 9 depicts a schematic view of the two proposed intramolecular structures. The precise determination of the three-dimensional solution structure of all sequences studied here is beyond the scope of this manuscript.
The sequences studied here actually correspond to the wild-type sequences, and contain cytosine and guanine tracts of unequal length. They are thus expected to form potentially multiple conformers that could interconvert at temperatures below T m , as has been observed for i-motif structures formed within the HIF-1α proximal promoter [42]. This could explain the broad NMR signals observed for the i-motif and G-quadruplex.
Accurate determination of the solution structures will require the systematic mutation of bases located in these tracts in order to reduce the conformational space sampled by the wild-type sequences.
The existence of two different i-motif structures, which mainly differ according to the protonation of bases in the loops, was previously proposed for a cytosine-rich sequence in the promoter region of the bcl-2 gene [5]. As in this previous work, the current study demonstrated the utility of a multivariate approach to extract quantitative information (distribution diagram) from the measured CD and molecular absorption spectra. Monitoring a process such as acid-base titration at just one wavelength clearly leads to the loss of valuable information.
The most striking characteristic of both quadruplex structures is the existence of a long loop that incorporates a short stretch of Watson-Crick base pairs. To our knowledge, no previous report has been published describing any i-motif structure with such a long hairpin loop. In a recent article, Brazier et al. reported the extraordinary stability of a cytosine-rich region in the PDGF-A gene which contains six tracts of cytosine numbering from 2 to 13 bases long [42]. This stability, however, was explained there in terms of long cytosine-rich loop regions, rather than by the formation of intramolecular hairpins. In our case, the existence of this loop stabilizes the structure in terms of ΔH 0 , ΔS 0 and ΔG 0 . However, this stabilization is not directly related to an increase in the melting temperature. The argument that an increase in T m can be directly related to an increase in stability at a given temperature has previously been debated [43,44]. Obviously, at 25 °C and pH 5.0, both structures are folded.
However, the unfolding of the wild-type sequence (nmyc01) is completed over a narrower temperature range than that of the mutated sequence (nmyc01m). As a consequence, the structure involving the Watson-Crick stem inside the long loop has a higher stability at 25 °C and pH 5.0 than the structure without that stem. This suggests the contribution of the hairpin to the stability of the overall structure, and our PAGE analysis suggests that this hairpin stabilizes one particular motif to favor the intramolecular folding of nmyc01 and/or nmyc02, and consequentially decreases intermolecular interactions giving rise to DNA multimers of high molecular weight, widely reported in the field of G-quadruplex forming structures. This behavior is particularly interesting in a biological context where formation of this stem-loop, repeated on both strand, could greatly contribute to the folding of both i-and G tetraplexes upon local unwinding of the n-myc promoter.
The data on the thermal stability of i-motifs has led to the proposal of two classes (I and II) into which currently known i-motifs can be grouped. Class I structures consist of short loop regions between cytosine tracts, whereas class II structures contain longer loop regions between cytosine tracts [18]. The classification of the i-motif formed by nmyc01 points to an intermediate situation between these two classes. The transitional pH from the single strand is 6.5, which classifies it as a class II i-motif. However, the thermal stability at pH 7 and the proposed short loops 2 and 3 indicate that it is a class I i-motif.
The reported data on the stability of G-quadruplexes show that, in general, the most stable in terms of T m values are those containing single-nucleotide lateral loops between the G-quartets [45]. The G quadruplex structure that could be formed by the nmyc02 sequence contains at least one such loop. In addition, the presence of two single nucleotide loops within a quadruplex-forming sequence constrains the structure to a parallel fold, which is independent of the length of the remaining loop (up to three nucleotides) [45]. In the case of nmyc02, the presence of a longer loop does not prevent the formation of a parallel G-quadruplex, as shown by CD spectroscopy. As the opposite, the formation of a short stem-loop greatly stabilizes a single, monodisperse structure as seen with gel electrophoresis. Other wild-type guanine-rich sequences, like those corresponding to the hypoxia inducible factor 1α promoter [46] and c-myc [47], also show parallel folding despite containing relatively long loops. Concerning the formation of a hairpin, to our knowledge only one parallel G-quadruplex structure containing a hairpin in a loop has been described previously, being that found at the hTERT core promoter [48]. As in the case of nmyc02, the long loop likely forms a stable hairpin structure, which would explain the unexpected stability of both G-quadruplex structures.
The first transition observed in themelting ofnmyc02was explained initially in terms of partial unfolding of the nmyc02 G-quadruplex initially present at 20 °C and pH 6.1. This transition would involve some unstacking of bases located in the loops or at the 5′ or 3′ ends of the nmyc02 sequence. However, in view of the SEC results, the first transition could be related to the breaking of the aggregates at temperatures lower than the melting temperature. The formation of multimers has been proposed for other sequences [49], including a sequence lacking of a long loop in the n-myc gene (5′-TAG 3 CG 3 AG 3 AG 3 A 2 -3′), [22]. In this last work, the melting of the dimeric form was not reflected in temperature-dependent UV absorbance profiles. Again, the application of a multivariate approach allowed the resolution of a complete distribution diagramfor the unfolding of a complex mixture. Finally, the presence of multimers for the nmyc02 sequence is consistent with the observation that two parallel processes occur during the formation of the Watson-Crick duplex from the folded G-quadruplex, as indicated by the fluorescence and SEC measurements.
This work has also shown that the Watson-Crick duplex is the predominant species in the mixture at pH 7 and 25 °C. However, low numbers of intramolecular structures are present at this pH, their contribution being higher than that of the duplex at pH values lower than approximately 6.1. The pH value is, as a consequence, a key variable modulating the equilibrium between the intra-and intermolecular species. Dysregulated pH is known to be an adaptive feature of most cancers, regardless of their tissue origin or genetic background. In normal differentiated adult cells, intracellular pH is generally lower (around 7.2) than the extracellular pH (around 7.4). However, cancer cells have a higher intracellular (around 7.4) and a lower extracellular pH (6.7-7.1) [50]. In these conditions, cytosine-rich sequences may adopt i-motif structures and modulate the formation of the other nucleic acid structures. Table 1. Sequences studied in this work.            Figure 1c-e showed the results obtained when four species were considered. From the calculated distribution diagram and pure spectra it is possible to calculate the reproduced CD and absorbance data. In this case, the calculated CD signal at 288nm(green line) and the experimental (blue symbols) superimpose, which supports the four-species model.  S2. Melting experiment of nmyc01 at pH 6.1. Figure S2a shows the spectra recorded during a melting experiment at pH 6.1. The trace at 295 nm indicated a hypochromic transition, characteristic of the unfolding of i-motif structures at a pH higher than the pK a of cytosine, with a transition midpoint around 30 o C.  i.e., three different species or conformations of nmyc01, were considered. The first transition in Figure S2b, which was accompanied by a large hypochromicity at 295 nm, corresponded to the unfolding of the i-motif, which was the major species at pH 6.1 and 20 o C, to yield a partially stacked strand. The T m of this transition was 33±1 o C. The calculated changes in enthalpy and entropy were 52 kcal·mol -1 and 170 cal·K -1 ·mol -1 , respectively. The second transition, which was mainly denoted by the variation in absorbance at 260 nm, was explained in terms of a loss of stacking upon heating.

40
The goodness of the proposed model is shown in the experimental vs. calculated absorbance values at 295nm next    Figure 4c-e showed the results obtained when three species were considered. From the calculated distribution diagram and pure spectra it is possible to calculate the reproduced CD and absorbance data. In this case, the calculated absorbance signal at 295 nm (green line) and the experimental (blue symbols) superimpose, which supports the three-species model. When only two acid-base species were considered, the calculated distribution diagram and fits are shown here. The calculated absorbance signal at 295 nm clearly does not fit the experimental values.