Understanding the effect of the nature of the nucleobase in the loops on the stability of the i-motif structure

Please do not adjust margins a. Department of Analytical Chemistry, University of Barcelona, Diagonal 645, E08028 Barcelona, Spain b. Institute of Physical Chemistry “Rocasolano”, CSIC, Serrano 119, E-28006 Madrid, Spain c. Institute of Molecular Biology of Barcelona (IBMB-CSIC), Baldiri i Reixac 4-8, 08028 Barcelona, Spain d. Institute for Advanced Chemistry of Catalonia (IQAC-CSIC), CIBER-BBN, Jordi Girona 18-26, E-08034 Barcelona, Spain. e. e) BIOESTRAN, asociated unit UB-CSIC Electronic Supplementary Information (ESI) available: it includes experimental information and detailed figures. See DOI: 10.1039/x0xx00000x Received 00th January 20xx, Accepted 00th January 20xx


Introduction
The i-motif structure is the only known DNA structure that consists of parallel-stranded duplexes held together by intercalated base pairs (Figure 1).The formation of the C•C + base pair needs the protonation of one of the cytosines at N3; the pK a value of which is around 4.5, depending on temperature and ionic strength.For this reason, stable i-motif structures are usually observed at a slightly acid medium 1 .3][4] The study of these structures is not only interesting from a biophysical and biomedical point of view , but also for their potential application in nanotechnology 5   .The i-motifs have been employed in the design of in vivo sensing systems 6 , artificial DNA-based nanomachines 7 and logic gates 8 , among other applications.Recently, it has been demonstrated that the rational manipulation of i-motif composition can be utilized in designing pH sensors and the capability to tune both the response range and response sensitivity 9 .The number and nature of the bases located in the loops may affect the stability of i-motif structures.It has been suggested that these structures may be classified into two groups depending on the length of loops 10 : i-motifs showing short loop regions among cytosine tracts (class 1), and i-motifs showing longer loop regions (class 2).Class 2 structures are more stable than class 1, although recent results have casted some doubts on the generality of this effect 11 .Using sequences that only contain thymine and cytosine bases it has been reported that shorter loops exhibit highest stability.In this sense, flexibility related to long loop regions could be detrimental to the stability of the i-motif structure, and long loop regions are only stabilizing if they can form additional intramolecular interactions that limit flexibility.However, it should be noted that this additional flexibility could also bring an additional favourable entropic contribution to the folding This journal is © The Royal Society of Chemistry 20xx Please do not adjust margins Please do not adjust margins free energy.The nature of the loops also influences the structure and stability of the i-motif.For example, it has been observed that the presence of adenine bases in the loops produces conformational changes involving disruption of the imotif core 12, 13   .In contrast, the mutation of the TAA repeats for TTT in intramolecular i-motif structure formed by the cytosine-rich strand of the human telomere increases its stability 14   .The influence of bases adjacent to the cytosinetracts, as well as the effect of loop length, has been addressed in a recent work 15 .There, it was concluded that: i) guanine or thymine in the loop positions adjacent to cytosine tracts in lateral loops stabilizes the i-motif; ii) the same number of bases in lateral loops is optimal for stabilization; iii) an additional C•C + base pair also stabilizes the i-motif.To our knowledge, the present work is the first study focused in the influence of the nature of internal bases located at the lateral loops on the thermal and acid-base stabilities of i-motif structures.In this work we will use the sequence 5'-TT CCC TXT CCC TTT CCC TXT CCC TT-3', where X are T, A, C or G (Figure 1) as a model to study these interactions.

Apparatus
CD spectra were recorded on a Jasco J-810 spectropolarimeter equipped with a Julabo F-25/HD temperature control unit.Hellma quartz cells (10 mm path length, 3000 ml volume) were used.NMR spectra were acquired either in a Bruker Advance spectrometer operating at 600 MHz (sequences CC, AA, GG, and CG) or in a Bruker Digital Avance 800 MHz (sequences TT, TA, CA and TG).Water suppression was achieved by the inclusion of a WATERGATE 16 module in the pulse sequence prior to acquisition.Absorbance spectra were recorded on an Agilent 8453 diode array spectrophotometer.The temperature was controlled by means of an 89090A Agilent Peltier device.Hellma quartz cells (1 or 10 mm path length, and 350, 1500 or 3000 ml volume) were used.The chromatographic system consisted of an Agilent 1100 Series HPLC instrument equipped with a G1311A quaternary pump, a G1379A degasser, a G1392A autosampler, a G1315B photodiode-array detector furnished with a 13-μL flow cell, and an Agilent Chemstation for data acquisition and analysis (Rev.A 10.02), all from Agilent Technologies (Waldbronn, Germany).A BioSep-SEC-S 3000 column (300 × 7.8 mm, particle size 5 μm and pore size 290 Å) from Phenomenex (Torrance, CA, USA) was used for the chromatographic separation at room temperature.

Reagents
The DNA sequences (Figure 1) were obtained from SigmaAldrich (St. Louis, Missouri, USA).Two control sequences were also synthetized for PAGE analysis (control1: 5'-TT CCC TTT AAA TTT CCC TTT CCC TT-3', and control 2: 5'-TT CCC TTT AAA TTT AAA TTT CCC TT-3').DNA strand concentration was determined by absorbance measurements (260 nm) at 90 o C using the extinction coefficients calculated using the nearestneighbor method as implemented on the OligoCalc webpage

PAGE analysis
The oligonucleotides were suspended in water and diluted at a concentration of 5 µM in a solution buffered at pH(5.0) (50mM sodium acetate pH(5.0), 10 mM potassium acetate) or at pH(7.8) (50mM Tris acetate pH(7.8),10mM potassium acetate).The solutions were next boiled for 5minutes at 95°C and cooled down to room temperature.Glycerol was added (2.5% v/v) and 5µL of the samples were loaded on 10x10.5 cm -12% non-denaturing polyacrylamide gels (19/1 acrylamide / bisacrylamide, Sigma) either buffered with 50mM Tri-sodium citrate adjusted to pH(5.0) with citric acid or with 0.5x Tris pH(8.0)Borate EDTA (TBE).Electrophoresis was run at 4°C with a miniVE apparatus (Hoeffer) for 4h. at 11V/cm.0.1M Trisodium citrate pH(5.0) was used a running buffer for the acidic gel and 0.5x TBE for the other.The gel buffered at pH(5.0) was then incubated 15min. in 1X TBE, rinsed with water and both gels were stained with SybrGold (Molecular Probes) according to the manufacturer's instructions and digitalized with a Typhoon 8600 system (Molecular Dynamics).

Size-Exclusion Chromatography
The mobile phase was 300 mM KCl and 20mM buffer (phosphate or acetate) adjusted to desired pH value.The flow rate was set to 1.0 mL min −1 .The injection volume was 15 μL.
Absorbance spectra were recorded between 200 and 500 nm.
Citosine was used as a marker of the permeation limit (retention time, t R , equal to 12.5min).T 15 , T 20 , T 25 and T 30 sequences were used as standards to construct the t R vs. log(MW) calibration plot for unfolded sequences.Standards were injected twice to assess the reproducibility of the t R values.At all pH values studied, the relative difference between t R values for a given standard was lower than 0.5%.Please do not adjust margins Please do not adjust margins

Analysis of melting data
Absorbance data as a function of temperature were analyzed as described elsewhere 18 . The physico-chemical model is related to the thermodynamics of DNA unfolding.Hence, for the one-step unfolding of intramolecular structures such as those studied here, the corresponding equilibrium constant may be written as: (1) For thermal stability studies, the concentration of the folded and unfolded forms is temperature-dependent.Accordingly, the equilibrium constant depends on temperature according to the van't Hoff equation: ln K unfolding = -H vH / RT + S vH / R (2) It is assumed that H vH and S vH will not change throughout the range of temperatures studied here.

Analysis of spectra recorded along acid-base titrations
Spectra recorded during acid-base experiments were arranged in a table or data matrix D, with m rows (spectra recorded) and n columns (wavelengths measured).The goal of data analysis was the calculation of distribution diagrams and pure (individual) spectra for all nc spectroscopically-active species considered throughout an experiment.The distribution diagram provides information about the stoichiometry and stability of the species considered (in the case of acid-base and mole-ratio experiments), as well as the thermodynamics of the melting processes.In addition, the shape and intensity of the pure spectra may provide qualitative information about the structure of the species.With this goal in mind, data matrix D was decomposed according to Beer-Lambert-Bouer's law in matrix form: where C is the matrix (m x nc) containing the distribution diagram, S T is the matrix (nc x n) containing the pure spectra, and E is the matrix of data (m x n) not explained by the proposed decomposition.The mathematical decomposition of D into matrices C, S T , and E may be conducted in two different ways, depending on whether a physico-chemical model is initially proposed (hardmodelling approach) or not (soft-modelling approach) 19-23   .For hard-modelling approaches, the proposed model depends on the nature of the process under study.Hence, for acid-base experiments the model will include a set of chemical equations describing the formation of the different acid-base species from the neutral species, together with approximate values for the stability constants, such as the following: (5) In this equation, the parameter p is related to the Hill coefficient and describes qualitatively the cooperativity of the equilibrium.Values of p greater than 1 indicate the existence of a cooperative process.Whenever a physico-chemical model is applied, the distribution diagram in C complies with the proposed model.Accordingly, the proposed values for the equilibrium constants and the shape of the pure spectra in S T are refined to explain satisfactorily data in D, whereas residuals in E are minimized.In this study, hard-modelling analysis of acid-base experiments used the EQUISPEC program 24 .

CD spectra
The  + base pairs (Figure 3a).Other NOEs characteristic of i-motif structures are clearly observable (i.e.: H1'-H2' cross-peaks in Figure 3b).The absence of signals between ~12.5 and ~14 ppm indicates that Watson-Crick base pairs are not formed in any of the sequences studied, including those containing complementary bases, TA (ESI) and CG (Figure 2).Interestingly, all sequences exhibit sharp imino signals around 11 ppm.These signals arise from loop thymines or guanines, and they are observable at relatively high temperature, as are the imino signals of the hemiprotonated cytosines.This suggests that loop residues adopt a defined structure.In most cases NOESY cross-peaks between some of these imino signals indicate the formation of TT mismatches (Figure 3c).Some differences are observed between NMR spectra acquired at pH 3.5 and 5.0, being such changes more pronounced in sequences different than TT (Figure 2).It is also worth noticing the significant differences in cytosine imino chemical shifts (illustrated in Figure S2.3) between oligonucleotides whose sequences differ only in loop nucleotides.

PAGE and SEC analysis
The overall conformational heterogeneity of the sequences studied at several pH values was studied by non-denaturing Polyacrylamide Gel Electrophoresis (PAGE) (Figure 4).The migrations at basic pH (8.3) of several sequences prone to form i-motif structures were similar to those of control sequences that cannot form this structure, indicative of unfolded oligonucleotides.On the other hand, the migrations at pH 5.0 showed an acceleration of the sequences containing the four CCC tracts as compared with the two control unfolded oligonucleotides.This acceleration is indicative of a more compact structure migrating faster, thus of intra-molecularly folded i-motif structure 25 . The same pattern of bands was observed for all the sequences considered in this study, which indicated a rather common structuration with equivalent stabilities in these conditions.Finally, the Size-Exclusion Chromatography (SEC) chromatograms showed that all sequences eluted as a single folded structure at pH 5.3, and 300 mM KCl (ESI), in non-denaturing conditions.On the other hand, elution at pH 7.1 indicated that all sequences were present as an unfolded strand.Chromatographic analyses done at pH 4.7 showed the presence of significant peak tails that were explained as a result of the strong adsorption of DNA sequences on the silica surface due to the protonation of silanol groups 26 (ESI).In Please do not adjust margins Please do not adjust margins

Acid-base titrations
Now, to explore the effect of these bases on the pH-induced formation of i-motif structures spectrophotometricallymonitored acid-base titrations were carried out range in the pH range 2-7 and 25 o C. Figure 5 shows the normalized absorbance values at 295 nm, a wavelength indicative of imotif formation.Starting at pH 7, the absorbance for the sequence TT shows two leaps occurring in narrow pH ranges with pH-transition midpoints at ~6.5 and ~2.7, respectively.Other sequences, such as TA, CA or TG, show similar behaviour.In contrast, the sequence AA shows a continuous transition from pH 7 to pH 2.5, denoting the presence of at least one additional protonation process in this pH range.Finally, other sequences, such as GG, CC or GC, show an intermediate behaviour.In an attempt to go in depth in the knowledge of the acid-base equilibria of these sequences, the recorded spectra were arranged in a D matrix and analysed by a multivariate data analysis method.This allowed the determination of the number of acid-base species and, for each one of these species, its pure spectrum and its individual concentration profile.The pure absorption spectrum provides qualitative information about the protonation state of bases in this acidbase species.On the other hand, the individual concentration profile provides information about the pH range of existence of this species, as well as the cooperativity associated to its formation (Table 1).Figure 6 shows the calculated distribution diagrams and pure spectra for sequences TT and AA.The titration of TT comprises two acid-base equilibria, i.e., it may be explained by the linear combination of three acid-base species.Both equilibria are characterized by a high degree of cooperativity.The equilibrium observed at pH ~6.5 involves the i-motif and the unfolded strand where cytosines are fully deprotonated.As the acid-base titrations were carried out at 25 o C, and the melting temperature of i-motif structures in this pH range is relatively low (~30 o C, see below), a mixture of species is expected.On the other hand, the equilibrium at pH ~2.7 involves the i-motif and the unfolded strand where cytosines are fully protonated.The introduction of bases other than thymine in the loops provokes the appearance of an additional protonation process with an apparent pK a value close to those of cytosine (~4.5) or adenine (~3.5).A detailed inspection of the traces (Figure 5) indicates this intermediate protonation process is much more pronounced in AA than in other sequences, such as TA, CA or TG, where the effect is very small.The calculated distribution diagram and pure spectra for AA sequence also reveals the presence of this intermediate transition (Figure 6).The existence of this additional transition was validated by fitting the experimental data to a two pH-transition model (ESI).The data not explained by the model of three components indicated that this intermediate transition actually exists.Also, a spectroscopically-monitored titration of the T 25 sequence, which only contains thymine bases (pK a around 9.5), has been done (ESI).As expected, no transition was observed in the pH range from 7.4 to 2.6.Interestingly, the nature of the bases in the loops also affects the equilibrium involving the i-motif and the strand with deprotonated cytosines (Figure 5 and ESI).The TT and TG sequences show the highest pH-transition midpoints (6.5 and 6.4, respectively), whereas AA, GG and TA show the lowest values (~6.2).As a result of the effects in the two pH-transitions, the sequences with purines in both lateral loops (AA and GG) fold into i-motif in the smallest pH range (~3.4 pH units) (Table 1).To gain a deeper knowledge about the effect of temperature on the acid-base equilibria, acid-base titrations of TT, AA, and CA sequences were also carried out at 37 o C (ESI).The analysis by means of multivariate methods showed that the number of transitions and the cooperativity was similar to those observed at 25 o C.However, the pH-transition midpoint for the first transition (the one that involves the unfolded strand) was shifted to lower pH values, whereas the other transitions remained practically unaltered.For TT, the first transition was ), whereas the second transition took place at pH 2.7 for both temperatures.For AA, the first transition was shifted from pH 6.15 to 5.86, whereas the second and third transitions remained practically unaltered (pH 4.5 vs. 4.6, and pH 2.4 vs. 2.5).This fact was related to the low thermal stability of i-motif structures at pH values near neutral (see below).

Melting studies
Next, the stability of the i-motif structures formed by the considered sequences against temperature changes was evaluated.Therefore, spectrophotometrically -monitored melting experiments were carried out at different pH.The recorded melting curves exhibit a single melting transition at 295 nm, with little difference between the heating and cooling curves (ESI).The general lack of hysteresis and the sigmoidal shape of the curves are indicative of reversible i-motif formation and relatively fast folding kinetics 27   .In these conditions, reliable thermodynamic parameters from these experiments have been calculated 28 .
It is possible to detect the presence of intermediates by using multivariate data analysis methods.For example, Figure 7 shows the results of data analysis corresponding to the melting data of CC sequence at pH 5.6.The presence of a process involving more than two states should be revealed by applying these procedures.From visual inspection of the trace at 295 nm (inset in Figure 7a) a T m value slightly lower than 50 o C may be determined.Table 2 summarizes the thermodynamic parameters of this folding, which have been calculated by using the usual methods based on a two-state process.Singular Value Decomposition (SVD) Analysis suggested the presence of four spectroscopically-active components in the data.These four components may be related directly to the unfolding process or may be just related to baseline drifts.In this case, a recently-published approach based on hybrid modelling was applied 29 .Figure 7b and 7c show the calculated distribution diagrams and pure spectra for each one of the four considered components.Component "green" is related to the initial i-motif structure, whereas the component "magenta" corresponds to the unfolded strand.The calculated T m , H •mol -1 , respectively, which agree with the values calculated from the classical univariate analysis.Components plotted in black explain baseline drifts at low and high temperatures, respectively.The residual variance (Figure 7d) (matrix E in equation 3) shows both small values (around 0.001 absorbance units) and a random behavior, which means that the proposed model explains satisfactorily the experimental data.In summary, the unfolding of the CC sequence at these experimental and instrumental conditions maybe explained satisfactorily as a two-state process.This procedure has also been applied to other sequences at pH values where a mixture of acid-base species has been proposed (ESI).In both cases, it has been possible to resolve the initial mixture of folded acid-base species and the calculation of thermodynamic parameters related to the unfolding process.The resolved spectra for the folded species, as well as the relative concentrations, are not exactly the same as expected from the acid-base distribution diagram.It is very difficult to match all data when the pure spectra of the components in the mixture are so similar 30, 31   .Moreover, the distribution diagram shows that these components are overlapped in the temperature direction, making more difficult the exact resolution of the system.In terms of T m values, CC forms the most stable structure throughout the pH range studied 15 ; whereas the GG and TG sequences form the least stable structures.The inclusion of additional cytosine bases in the loops slightly increases the stability of the i-motif, whereas the inclusion of purines clearly reduces its stability.The inclusion of a potential W•C base pair (GC or TA sequences) does not provoke a significant effect in , it must be concluded that there are additional interactions contributing to the i-motif stability, like proton transfer equilibria or solvation.These contributions may be due, at least in part, to interactions involving loop residues.For example, T•T base pairs are observed in the NMR experiments.T•T based pairs have been found in many i-motif structures 33 .In addition to the two hydrogen bonds formed in each T•T base pair, these base pairs are isomorphous with the C•C + ones, and fit well in the intercalative structure, contributing to its stability through favourable stacking interactions.In general, an increment of the H 0 value was observed at lower pH values, with a maximal value around pH~4.5.As a general trend, G 0 37oC values were seen proportional to pH (ESI).In terms of G 0 37oC , the AA and GG sequences are the least stable structures in the pH range 4.6 -5.6.On the other hand, the CC sequence shows a higher stability throughout the whole pH range studied.Altogether, the results shown here suggest that the nature of the bases in the loops not only affects the stability of i-motif structures with respect to temperature, but also produces structural modifications that can be detected by studying the acid-base behaviour.In terms of T m values, the CC, TT and CA sequences form the most stable i-motif structures at pH 5.6; whereas the AA and GG sequences form the least stable ones.Therefore, it may be concluded that the thermal stability of the i-motif structure decreases when two purine bases are placed opposite to each other.Concomitantly, the results obtained from the acid-base titrations show that the AA and GG sequences exhibit the lowest stability with respect to pH changes.These sequences fold into the i-motif structure in a narrower range of pH compared to the others (2.8-6.2 vs. 2.4-6.5).It has also recently been found for duplex DNA that the presence of mismatches produced significant local structural alterations, especially in the case of purine•purine mismatches 34 , due to their greater size.

Conclusions
Our results suggest that the thermal stability of intramolecular i-motifs and its pH dependence is clearly related to the composition of at least two of their loops.The presence of purine bases destabilizes the structure, whereas cytosine bases in these loops confer more stability at acidic pH.An additional acid-base transition has been observed at pH around 4.5 for all the considered sequences except for TT, which only contains cytosine and thymine bases.The pH range of existence of i-motif structures is dependent both, on the nature of the loops and on the temperature at which the measurement is done.Increasing the temperature reduces the pH at which takes place the transition involving the i-motif and the unfolded, deprotonated sequence.The different pH range in which these intramolecular i-motifs can fold may be useful for the development of pH responsive systems, such as selective delivery mechanisms based on molecular switches, biosensors and nanomachines.

Figure 1 .
Figure 1.(a) Cytosine-protonated cytosine base pair; (b) Hypothetical scheme of the intramolecular i-motif structures adopted by the sequences studied in this work; X=A, C, G or T bases; (c) Sequences studied in this work formation of i-motif structures was firstly assessed by means of Circular Dichroism (CD) experiments carried out at several pH values and 25 o C. The overall spectral signature characteristic of the i-motif structure (a negative band around 265 nm and a stronger positive band around 287 nm) was clear for all the considered sequences (Electronic Supplementary Information, ESI).The CD spectra show practically no variation with pH for several sequences, such as TT.On the other hand, the CD spectra of other sequences, such as AA or GG, show clear changes with pH.Other sequences, such as CG, TA or CA, among others, show an intermediate behavior.These observations point to local modifications of the i-motif structure due to the nature of internal bases at the loops.NMR spectra I-motif formation was then demonstrated by Nuclear Magnetic Resonance (NMR) (Figure 2 and ESI).The imino proton resonances at ~15 ppm indicate the presence of protonated cytosines and the formation C•C + base-pairs.The number of signals and their chemical shifts are very similar in spectra acquired at different temperatures, ranging from 5 o C to 45 o C (ESI), indicating that the structures remain unaltered, in this temperature range.In general, spectra recorded at pH 3.5 exhibit sharper and more dispersed signals than those acquired at pH 5.0 (Figure 2).

Figure 3 .
Figure 3. Regions of NOESY spectra of CC at pH 4.0 and T = 5 o C, 90:10 H2O/D2O (mixing time: 150 ms).Buffer conditions were 20 mM potasium phosphate, 100 mM KCl, pH 4. a) Detail of the hemiprotonated cytosine imino region, showing the formation of six C•C + base pairs.b) Region of the NOESY spectra, showing H1'-H1' correlations characteristic of i-motif structures.c) Detail of the imino region of the NOESY spectra, showing cross-peaks between thymine imino protons; and d) imino-imino cross-peaks between adjacent C•C + base pairs)

Figure 4 .
Figure 4. PAGE analysis.The migrations at pH 8.0 of the sequences incubated at pH 7.8 show equal mobility, indicative of unfolded oligonucleotides.The migrations at pH 5.0 of the sequences incubated at pH 5.0 show an acceleration of those sequences containing the four CCC tracts (lanes 3-10) as compared with the two control oligonucleotides (lanes 1-2).
fact, analyses done at pH 3.0 and 4.0 were unsuccessful because of the complete disappearance of chromatographic peaks.The results obtained until now indicate that the nature of bases in the considered sequences does not have a dramatic influence on the folding and molecularity of the i-motif structures formed.This journal is © The Royal Society of Chemistry 20xx Phys.Chem.Chem.Phys., 2015, 00, 1-3 | 5

Figure 5 .
Figure 5. Absorbance values recorded during the acid-base titration of the sequences studied at 25 o C.

Figure 6 .
Figure 6.Calculated distribution diagrams (matrix C) and pure spectra (matrix S) from multivariate analysis of data recorded along the acid-base titrations of TT (a and b, respectively) and AA (c and d, respectively) sequences at 25 o C.

Figure 7 .
Figure 7. Spectra recorded along the melting of CC sequence at pH 5.6 (a).Inset: absorbance trace at 295 nm.Calculated distribution diagram (matrix C) (b) and pure spectra (matrix S) (c).Absorbance data in matrix E not explained by the proposed model (d).The green and magenta components correspond to the folded and unfolded strands, respectively.The black components explain baseline drifts not related to the unfolding event.

Table 1 .
Calculated parameters from the multivariate analysis of spectra recorded along spectrophotometrically-monitored acid-base titrations at 25 o C.
Please do not adjust marginsPlease do not adjust margins the stability compared to the control sequence, TT.Analysis of the thermodynamic parameters shows that the values of H This journal is © The Royal Society of Chemistry 20xxPhys.Chem.Chem.Phys., 2015, 00, 1-3 | 7