How abasic sites impact hole transfer dynamics in GC-rich DNA sequences

Changes in DNA charge transfer properties upon the creation of apurinic and apyrimidinic sites has been used to monitor DNA repair processes, given that such lesions generally reduce charge transfer yields. However, because these lesions translate into distinct intra and extrahelical conformations depending on the nature of the unpaired base and its DNA context, it is unclear the actual impact of such diverse conformations on charge transfer. Here we combine classical molecular dynamics, quantum/molecular mechanics (QM/MM) calculations, and Kinetic Monte Carlo simulations to investigate the impact of abasic sites on the structure and hole transfer (HT) properties of DNA. We consider both apurinic and apyrimidinic sites in polyG and polyGC sequences and find that most situations lead to intrahelical conformations where HT rates are significantly slowed down due to the energetic disorder induced by the abasic void. In contrast, the presence of an unpaired C flanked by C bases leads to an extrahelical conformation where stacking among G sites is reduced, leading to an attenuation of electronic couplings and a destabilization of hole states. Interestingly, this leads to an asymmetric HT behavior, given that the 5’ to 3’ transfer along the G strand is slowed down by one order of magnitude while the opposite 3’ to 5’ transfer remains similar to that estimated for the reference polyG sequence. Our simulations thus suggest that electrochemical monitoring of DNA repair process following changes in charge transfer properties can miss repair events linked to abasic sites adopting extrahelical conformations.


Introduction
The loss of a purine or pyrimidine base in DNA, resulting in the creation of an abasic site, is one of the most frequent DNA lesions, with a ratio of formation reaching thousands of times per day in a human cell. 1 Abasic sites are formed by spontaneous or enzymatic hydrolysis of the N-glycosidic bond, [2][3][4] by chemical modifications of nucleic bases, or by physical agents like UV or γ radiation. 36][7][8] In general, sequences containing abasic sites can adopt unusual conformations, making them fragile and susceptible to breaks. 9,10Luckily, cells are equipped with an elaborate repair machinery to neutralize the proliferation of cytotoxic and promutagenic defects.Among such repair mechanisms, base excision repair (BER) starts with the formation of an abasic site due to the excision of a damaged or mispaired nucleobase by the action of a DNA glycosilase. 11,12The process then continues with the recognition of the abasic site by an apurinic/apyrimidinic (AP) endonuclease 13,14 and finishes with the replacement of the missing nucleotide.6][17][18] The study of the different steps involved in the repair process is crucial and may have vast consequences in cancer research.On the other hand, there is a growing interest in the synthetic incorporation of abasic sites due to its application to the development of biosensors.Abasic site biosensors have been used to detect analytes using fluorescent and electrochemical probes by selectively binding to the unpaired nucleobase, 19,20   while electrochemical monitoring of DNA repair processes is also possible following the change on the charge transfer (CT) properties associated with the creation of abasic sites.
Indeed, the CT properties of DNA have been found to be strongly sensitive to the presence of abasic sites, which in general lead to a decrease in the CT yield. 22,23How abasic sites modulate CT kinetics on DNA duplexes, however, has been only briefly explored. 24Many electrochemical studies, for example, have used abasic sites as a brake for the CT process.The Slinker group proved the DNA repair function of two DNA glycosylases by measuring the current though a DNA monolayer after and before the damaged base excision, showing a substantial decrease in redox signal after the formation of the abasic site. 21This group also demonstrated an attenuation of the CT yield in the presence of an abasic site in of CT properties on DNA sequence, the absence of a nucleobase is expected to directly modulate CT rates and pathways.However, the unusual conformations induced by the defect can lead to concomitant changes in the CT parameters (site energies and couplings) of proximate nucleobases, thus indirectly modulating CT kinetics.Indeed, a variety of studies indicate that the unpaired nucleobase can adopt intra and extrahelical conformations, depending on the nature of the unpaired base and its context in the DNA sequence.

25-37
In this study, we combine classical molecular dynamics (MD) simulations with semiempirical INDO/S quantum/molecular mechanics (QM/MM) calculations to investigate how an abasic site influences the structure and hole transfer (HT) kinetics of six model GC-rich DNA duplexes.In particular, we investigate the impact of a guanine (G) or cytosine (C) deletion in 5'-GXG-3' and 5'-CXC-3' contexts and compare our results with the analogous unmodified DNA 15-mers.We aim at understanding how these defects distort the double helix structure, the site energies and electronic couplings of the proximate nucleobases, and the overall HT process, depending on the DNA sequence.Our results indicate that in situations where the unpaired nucleobase adopts intrahelical conformations, the overall HT process is slowed down by ~1-2 orders of magnitude.However, when an unpaired C adopts an extrahelical conformation the transfer becomes similar to the reference DNA in the 5' to 3' transfer direction along the G strand, but it is slowed down by 1 order of magnitude in the opposite direction, thus introducing a preferential route for oxidative damage.

MD simulations
MD simulations at room temperature (300 K) were performed for the six 15-mer DNA duplexes shown in Fig. 1, terminated at Please do not adjust margins Please do not adjust margins both 3' and 5ʹ ends by hydroxyl groups.These systems represent polyG and polyGC central sequences where either a G or a C base has been deleted leading to an abasic site.The sequences were capped with GCGC moieties to keep base pairing in the ending regions and thus prevent strand slippage, which has been found to be enhanced by the presence of abasic sites. 38We adopted the prototypical abasic site, tetrahydrofuran (X), commonly used as a model for enzymatically generated abasic sites, which exist as mixtures of α and β hemiacetals, due to its chemical stability. 28,37MD simulations were performed starting from standard B-DNA fiber geometries created using the NAB module of the Amber12 software, 39 eventually modified in order to delete a C or G nucleobase and generate the abasic site.The systems were neutralized adding Na+ ions 40 and solvated in a truncated octahedron box of TIP3P water (buffer region 14 Å).Two sets of simulations were run either adopting the parmbsc0 41 and the recently developed parmbsc1 42 force fields for DNA, the latter introducing some improvements, for example, related to excessive terminal fraying.The tetrahydrofuran residue was modelled using parmbsc0 and parmbsc1 parameters for the phosphate and sugar moieties, the latter capped with a hydrogen atom, which charge was adjusted to give a global charge of -1.The solvent and counterions where first energy minimized during 1000 steps, followed by 2500 steps of further minimization of the full system.Then, the systems were gradually thermalized up to 300 K during 200 ps with weak constrains in the nucleic acids (10 kcal/mol•Å 2 ), followed by 200 ns production runs at 1 bar.For the GX' system, production runs were extended up to 350 ns.All simulations were done using the Amber 12 suite of programs 39 using an integration time step of 2 fs, the SHAKE algorithm to restrain bonds involving hydrogen, periodic boundary conditions, the Particle Mesh Ewald approach, and a nonbonded cutoff of 10 Å.For each system, we extracted a total of 1000 structures at equal intervals from the last 100 ns of MD trajectories to be used as input in QM/MM calculations.In addition, we performed a cluster analysis based on the root-mean square deviations (RMSD) along the last 100 ns of trajectory.From this analysis, a representative structure for each system was extracted from the most populated cluster.

QM/MM calculations
QM/MM calculations were performed using the semiempirical INDO/S method on the structures extracted from the MD trajectories.Then, we applied Koopman's approximation to estimate HT energies and electronic couplings. 43Previous benchmark studies have shown that application of this scheme at the INDO/S level of theory 44,45 provides surprisingly good estimates of both HT energies and couplings in DNA stacks compared to high-level MS-CASPT2 data, 46 while keeping a moderate computational cost that allows accounting for structural dynamics, which significantly affect CT properties in DNA. 43In particular, the semiempirical INDO/S method has been shown to perform well in calculations of HT parameters, while the MNDO scheme and related approaches like AM1 or PM3 considerably underestimate couplings values. 44We note here that the MS-CASPT2 benchmarks were obtained using a level shift parameter of 0.2, whereas no IPEA shift was used.In all calculations the QM region included the 7 base pairs in the central region of the duplex (excluding the sugar and backbone atoms), as indicated in Fig. 1, where the C1′ atom of the sugar unit was replaced by a hydrogen atom with a 1.08 Å bond length.
Electrostatic embedding effects exerted by the environment (DNA, solvent, and counterions) were considered adopting the charges as defined in the force field used for the MD simulations.Thus, we do not account for explicit polarization effects in the MM region.MM polarization has been shown to introduce a considerable impact on reorganization energies, 47 but in this study we do not compute this quantity, as explained in the next section.Instead, we expect minor changes on HT couplings, which are mediated by orbital overlap, 48 and on HT energies, as here the energies of the radical cation states are not explicitly calculated.In fact, they are obtained from an MD simulation of the neutral QM system including 7 base pairs, so explicit MM polarization is expected to introduce small variations on the relative energies of the sites.Hole states associated to the G sites in each duplex were represented by six or seven highest-occupied molecular orbitals (HOMOs) of the neutral species (HOMO → HOMO-5/HOMO-6), while CT couplings were estimated using the Fragment Charge Difference method (FCD): 49 For hole transfer (HT), adiabatic splittings  !−  ! were estimated using the one-electron energies of the HOMOs of neutral systems.Assuming a two-state-model, free energy differences for the HT reaction were then estimated as In Eq. ( 1), Δ ! and Δ ! are the difference of hole charges on the donor and acceptor sites in the adiabatic states of interest and Δ !" is the corresponding off-diagonal term.The calculation of these quantities and the underlying approximations are described in previous studies. 49,50

Hole transfer kinetics
HT rates for the forward and reverse reactions shown in Fig. 1 were estimated using Marcus theory: (2) This journal is © The Royal Society of Chemistry 20xx Please do not adjust margins Please do not adjust margins The driving force of the HT reaction (Δ ! ) was estimated from the energy difference in donor/acceptor ionization potentials, and the reorganization energy (), which includes internal and solvent terms, was assumed to be 0.7 eV in all cases. 514][55] Whereas a different choice of  would change the absolute HT timecales we predict, the relative HT rates between standard DNA and the analogous sequences including an abasic site are more robust and essentially unaffected by the value of .All rates were computed using the energies and squared couplings averaged over the MD trajectories.The effective rates corresponding to the overall HT process G 1 → G 5 and G 5 → G 1 were simulated using a Kinetic Monte Carlo (KMC) algorithm. 56HT rates were used to generate a cumulative probability distribution function (CPD) for each possible step.This function is associated with a transition i → n according to Eq. ( 3), where  !" ! is the sum of transfer rates from the same origin i.
The Monte Carlo hopping algorithm consists of the following steps: 1) Compute the CPD function for each allowed transfer.
2) Start at a given site i (G 1 or G 5 ), choose a random number 0 < r 1 ≤ 1 and hop to site j, where CPD i(j) ≥ r 1 > 0.

DNA structure
6][27][28][29][30][31][32][33][34][35][36][37] The bulk of these studies indicate that the overall B-DNA structure of the duplex is maintained in the presence of an abasic site.However, the lesion increases structural flexibility, and the local conformation of the unpaired base depends on its nature and its sequence context.Thus, whereas purine unpaired nucleobases adopt intrahelical conformations that preserve π stacking, pyrimidines stack poorly and tend to be extrahelical, although intrahelical conformations have also been identified when the pyrimidine is flanked by two guanines.Thus, it is important to assess the reliability of the structural models obtained here from MD simulations before investigating the impact of abasic sites on hole transfer dynamics.
In Fig. 2 we show the positional RMSD of the DNAs computed with respect to the initial B-DNA structure along the MD simulations performed using either the parmbsc1 or the parmbsc0 force fields.In all cases, neglecting terminal base pairs only lead to minor variations in RMSD values, indicating that terminal fraying was small.If we focus on the standard DNA sequences without the lesion (G and GC systems), simulations performed using the refined parmbsc1 potential leads to RMSD values ∼1 Å lower than those obtained using the parmbsc0, thus keeping the structures closer to the initial B-DNA.The presence of an abasic site (GX, GX', GCX and GCX' Please do not adjust margins Please do not adjust margins systems) also leads to differences in the structures described by both force fields, again with parmbsc1 displaying more stable structures characterized by lower RMSD values.For the GCX system, both force fields lead to similar intrahelical position for the unpaired guanine site, as shown in Fig. 3, with the sugar of the abasic site flipping in and out of the duplex, in accord with NMR data and theoretical simulations performed for an unpaired G in the same CGC 25,27   or similar CGT contexts. 36When the unpaired guanine is flanked by other G bases (GX system), both force fields again predict intrahelical conformations, in accord with previous studies of unpaired guanines in GGG 27,37 or GGA sequences.

28
In this case, however, the simulation performed using parmbsc1 shows a strong disruption of the Watson-Crick hydrogen bonds between the G 2 and C 2 base pairs adjacent to X, with the unpaired G 3 and C 2 forming non-Watson-Crick hydrogen bonds.This pattern is also briefly exchanged with similar interactions between G 3 and C 4 (the base 5' to X).The observation of this interaction between the unpaired base and a nucleobase adjacent to X agrees with previous predictions for this GGG sequence 27 and was also observed for an unpaired cytosine in a GCA context. 28In a recent study, NMR and constrained MD simulations also pointed to such non-Watson-Crick pattern for the same GGG sequence when a 2′- Please do not adjust margins Please do not adjust margins deoxyribose abasic site was considered, whereas, for its chemically stable analogue tetrahydrofuran, as used here, the standard Watson-Crick pattern was kept. 37We note however that the sequence of the DNA in that study differs from the one considered here and contained also an additional 8oxoguanine lesion.
On the other hand, when the unpaired site is a cytosine in a GCG context (GCX' system) the parmbsc1 potential describes a stable intrahelical conformation, in agreement with previous predictions, 27 whereas parmbsc0 leads to both intrahelical and extrahelical conformations characterized by large distortions of the overall stacking pattern.Intrahelical conformations have previously been observed for an unpaired C in a similar GCA context 28 or for an unpaired T flanked by guanines.
Finally, when the unpaired cytosine is located in a CCC sequence (GX' system), both force fields initially predict a stable intrahelical conformation.However, with the parmbsc1 potential at ~180 ns, the orphan base C 3 flips out of the helix, which collapses allowing the flanking bases to stack.More in detail, the carbonyl group of the extrahelical C 3 establishes a hydrogen bond with the amino group of C 2 , which keeps its Watson-Crick hydrogen bonds with G 2 thus leading to a triplexlike G 2 :C 2 :C 3 structure, in which the G 4 :C 4 base pair is mainly stacked with the C 2 :C 3 pair.Extension of the simulations for a total of 350 ns confirmed the stability of this conformation, in agreement with previous NMR data 25 and theoretical predictions 27 pointing out to such extrahelical configuration.
6][27][28][29][30][31][32][33][34][35][36][37] Moreover, in Fig. 4 we show superimposition of the backbone of the DNA duplexes studied, indicating that the abasic site in the GX, GX', GCX and GCX' duplexes does not alter the global B-like conformation, in accord again with previous studies.Thus the description of the structural impact of abasic sites seems to be quite robust when the refined parmbsc1 force field is adopted, and in the following, we use the trajectories sampled with this potential to investigate the impact of these lesions on hole transfer dynamics.

Charge transfer kinetics
HT in DNA is known to be rather sensitive to structural deformations of the double helix, especially those modifying the degree of stacking between nucleobases.We have thus computed the energies of the hole states as well as the  Please do not adjust margins Please do not adjust margins corresponding HT couplings for the guanines located in the central regions of the duplexes, as shown in Fig. 1, in order to investigate how this lesion affects HT dynamics.In Fig. 5 we show schematically the relative energy of different guanine sites in each system, whereas in Table 1 we report the energy differences among sites and the corresponding electronic coupling parameters and HT times.Based on the HT times for individual transfer steps, we then used KMC simulations to simulate the overall HT times reported in Table 2.We simulated both forward and backward processes, starting from G 1 and recording the time needed to reach G 5 or the inverse transfer process.
In the absence of any lesion, the HT transfer process in the GC system follows several pathways involving both intra and interstrand hops, given the relatively similar energies and electronic couplings involved, although interstrand hops are slightly favored.In this case, the overall forward (G 1 → G 5 ) and backward (G 5 → G 1 ) times are ~2 ns.In the G system the transfer only involves intrastrand hops, characterized by much larger electronic coupling values ~100 meV, thus leading to a faster transfer with times 0.2 and 0.04 ns for forward and backward processes.In the polyGC sequence, the presence of the abasic site (GCX and GCX' systems) does not lead to major structural changes in the double helix, as shown in Fig. 3, with the unpaired base adopting an intrahelical conformation.In the GCX' sequence, indeed, the electronic couplings relevant for the HT process (V 12 , V 24 , and V 45 ) remain very close to the GC reference system, and the main impact of the lesion is a significant stabilization ~150 meV of G 4 with respect of G 2 .This stabilization slows down hops from G 2 to the neighboring sites, and together with the lack of the intermediate G 3 leads to a slower overall time of ~300 ns compared to the GC duplex.In the GCX system, in contrast, the unpaired G 3 is partially displaced toward the abasic site void (see Fig. 3), leading to a better stacking with G 2 and G 4 which increases V 23 and V 34 interstrand couplings from 9 to 21 meV and from 24 to 32 meV, respectively, as found previously for a similar sequence. 57On the other hand, V 24 decreases from 8 to 2 meV, because in this case there is no C 3 mediating the coupling between G 2 and G 4 .In this case, however, the abasic site induces an important destabilization of the hole states on G 3 and G 4 .Thus, although interstrand couplings are enhanced compared to the reference GC system, the HT process is again considerably slower with total times ~100-150 ns due to the larger time needed to access those sites.
In the G system, the abasic site leads to qualitatively different structural deformations in GX and GX'.As discussed previously, in GX the Watson-Crick hydrogen bonds between the G 2 and C 2 base pairs are disrupted, with the unpaired G 3 forming non-Watson-Crick hydrogen bonds with C 2 3' to the abasic site, as shown in Fig. 3.This conformational change does not alter much the couplings among guanines compared to the G system, which remains close to ~100 meV, but leads to an Please do not adjust margins Please do not adjust margins important energetic destabilization of G 2 and G 3 hole states and a corresponding increase in the times needed to reach these sites.Thus, the overall HT times, both for the forward and backward pathways, are increased to ~2 ns compared to the 0.04-0.2ns obtained for the same sequence without the lesion.
In the GX' system, on the other hand, the lesion induces an important conformational change, with unpaired C 3 adopting an extrahelical conformation and the sugar of the abasic site being extruded from the helix.As discussed in the previous section, this leads to the triplex like G 2 :C 2 :C 3 structure shown in Fig. 6, in which the G 4 :C 4 base pair is mainly stacked with the C 2 :C 3 pair, but also to some degree with the G 2 :C 2 pair.
Because the stacking between G 1 and G 2 is not significantly modified, the V 12 intrastrand coupling remains similar to the value estimated for G and GX, around 90 meV.The V 24 and V 45 couplings, however, are approximately halved compared to the standard ~100 meV coupling value for intrastrand G-G contacts because the stacking between G 2 /G 4 and G 4 /G 5 is reduced for these pairs.This reduced stacking explains the partial destabilization of G 2 , G 4 and G 5 shown in Fig. 4, given the well-known ability of GG motifs to stabilize hole states.

58
KMC simulations indicate that destabilization sites, specially of G 5 , induces a significant asymmetry in overall HT times for the forward G 1 → G 5 and the backward G 5 → G 1 process, for which we estimate time constants of 1.8 ns and 23 ps, respectively, compared to 210 ps and 38 ps estimated for the G system without lesion.Overall, thus, the partial disruption of the staking interactions along the G strand in GX' leads to fast forward HT times similar to the reference G system, whereas the backward process is significantly slowed down by one order of magnitude, inducing a directional asymmetry in HT dynamics.

Conclusions
Apurinic and apyrimidinic sites constitute one of the most frequent DNA lesions.Indeed, the repair of other lesions by the BER machinery leads also to the formation of an abasic site after enzymatic hydrolysis of the N-glycosidic bond.Electrochemical monitoring of DNA repair processes is thus often performed following the modulation of charge transfer properties associated with the creation of abasic sites.Despite the fact that several studies indicate a general attenuation of charge transfer yield in the presence of abasic sites, the fact that such lesions lead to diverse alterations in the DNA structure depending on the nature of the unpaired nucleobase and the DNA context suggests that the consequences on charge transfer dynamics are also diverse.In this study, we have investigated the impact of unpaired cytosine and guanine sites on the structure of polyG and polyGC sequences in atomic detail using MD simulations.Individual HT rates are then derived from Marcus theory based on MD-averaged energies and couplings estimated using QM/MM calculations, and the impact of abasic sites on overall HT dynamics are modeled using a Kinetic Monte Carlo scheme.
We find that MD simulations based on the parmbsc0 force field lead to large structural fluctuations and unusual conformations for the DNA sequences considered.In contrast, those performed using the recently refined parmbsc1 potential point to small alterations on the global B-DNA structure, with abasic sites adopting both intrahelical (GX, GCX and GCX') and extrahelical (GX') conformations, in agreement with previous studies based on NMR spectroscopy and theoretical models.
When the DNA adopts intrahelical conformations, the electronic interactions among guanines remain similar to the reference sequence without lesion.However, the abasic void induces a significant disorder in the energies of the hole states, which translate into HT longer times by ~1-2 orders of magnitude.In the GX' sequence adopting an extrahelical conformation, in contrast, the unpaired cytosine and the abasic sugar are extruded from the DNA and the double helix collapses.This leads to a a reduced stacking among sites in the G strand, leading to a destabilization of the hole states over G 2 , G 4 and G 5 and a significant attenuation of electronic coupling values between these sites.Interestingly, this translates into an asymmetric behavior in HT kinetics, given that the 5' to 3' transfer along the guanine strand (G 1 → G 5 ) is slowed down by one order of magnitude, whereas HT in the opposite 3' to 5' direction remains similar to that estimated for the reference sequence.
Overall, our results thus show that abasic sites giving rise to intrahelical conformations generally slow down HT dynamics due to energetic disorder induced by the presence of the abasic void, whereas extrahelical conformations lead to less trivial implications on HT dynamics, including asymmetries in the directionality of HT.Thus, our results suggest that electrochemical monitoring of DNA repair process following changes in charge transfer properties can miss repair events linked to abasic sites adopting extrahelical conformations.

Fig. 1
Fig. 1 Sequences of the duplex DNA systems considered (left) and corresponding Kinetic Monte Carlo schemes (right) used to simulate hole transfer dynamics.Nucleobases enclosed in the boxes were included in the quantum-mechanical region in QM/MM calculations.

. 4 )
3) Compute the waiting time t w at site i, generating a new random number 0 < r 2 ≤ 1 as  != −   ! !" !Choose a new random number and repeat steps 2 and 3 recording the total accumulated waiting time.5) Stop if the final site has been reached or if the total time exceeds a fixed value t max .For each system, we averaged HT times over 10 6 realizations of the KMC algorithm.

Fig. 3
Fig. 3 Representative structures of DNA duplexes extracted from MD simulations based on the parmbsc1 force field: a) G, b) GX, c) GX', d) GC, e) GCX and f) GCX'.

Fig. 5
Fig. 5 Representation of the free energies computed using QM/MM calculations for the radical cation states localized on guanine sites.a) G, GX and GX' systems, b) GC, GCX and GCX' systems.

Fig. 6
Fig. 6 Structure of the abasic site region for the GX' system.

Table 1 .
Electronic couplings, free energy differences and forward and backward HT times computed for the DNA sequences.

Table 2 .
Overall hole transfer times computed using Kinetic Monte Carlo simulations for G1 → G5 and G5 → G1 transfers.