Polarizable QM/MM Multiconfiguration Self-Consistent Field Approach with State-Specific Corrections: Environment Effects on Cytosine Absorption Spectrum.

We present the formulation and implementation of a polarizable quantum mechanics/molecular mechanics (QM/MM) strategy to describe environment effects in multiconfiguration self-consistent field calculations. The strategy is applied to the calculation of the vertical absorption spectrum of cytosine in water. In our approach, mutual polarization of the solute and the solvent is solved self-consistently at the complete-active-space self-consistent-field (CASSCF) level, and the resulting set of charges and dipoles is used to calculate vertical excitation energies using the complete-active-space second-order perturbative (CASPT2) approach and its multistate (MS-CASPT2) variant. In order to treat multiple excited states, we converge the solvent polarization with respect to the state-averaged density of the solute. In order to obtain the final energies, however, we introduce a state-specific correction, where the solvent polarization is recomputed with the density of each state, and demonstrate that this correction brings the excitation energies closer to the values obtained with state-optimized orbitals. Comparison with PCM and nonpolarizable QM/MM calculations shows the importance of specific solute-solvent interactions and environment polarization in describing experiments. Overall, the calculated excitations for the π → π* states in water show good agreement with the experimental spectrum, whereas the n → π* appear at energies above 6 eV, approximately 1 eV higher than in the gas phase. Beyond solvents, the new method will allow studying the impact of heterogeneous biological environments in multiple excited states, as well as the treatment of multichromophoric systems where charge transfer and exciton states play important roles.

correction brings the excitation energies closer to the values obtained with stateoptimized orbitals. Comparison with PCM and non-polarizable QM/MM calculations shows the importance of specific solute-solvent interactions and environment polarization in describing experiments. Overall, the calculated excitations for the π→π * states in water show good agreement with the experimental spectrum, whereas the n→π * appear at energies above 6 eV, approximately 1 eV higher than in the gas phase.
Beyond solvents, the new method will allow studying the impact of heterogeneous biological environments in multiple excited states, as well as the treatment of multichromophoric systems where charge transfer and exciton states play important roles.

Introduction
Environment effects play a key role in photochemistry and photophysics. [1][2][3][4] The change in chromophore-environment interaction energy concomitant to a change in electronic state can lead, for instance, to significant shifts in the energies of excited states in vacuum compared to a condensed phase. This can modulate the potential energy surfaces and the internal conversion pathways available for a molecule to dissipate the energy after photon absorption. In complex systems where several chromophores come into play, the environment can modulate excited state dynamics in a variety of ways, for example by stabilizing charge transfer states or by modulating the electronic coupling promoting photoinduced charge and energy transfer processes. This complexity is exemplified by DNA photoprotection mechanisms, where internal conversion competes with charge and energy transfer processes highly sensitive to the surrounding environment. [5][6] Several fluorescent probes are also used to reveal a variety of biochemical and cellular processes with unprecedented detail, and in such cases a good understanding of the underlying photophysics is critical in the interpretation and design of the experiments. 7 This importance has led to continuous efforts to develop and improve the models that account for the environment in excited state quantum chemical calculations. 2,4 These models typically resort to hybrid models in which the quantum-mechanical (QM) description of the chromophore is combined with a classical description of the environment through continuum or molecular mechanics (QM/MM) models.
Continuum models, where the environment is modeled as a homogeneous dielectric medium, are very popular nowadays due to their efficiency and remarkable accuracy, especially for homogeneous environments (solvents) where no strong specific solutesolvent interactions are present. When strong interactions like persistent hydrogen bonds are present, or in complex systems where the environment is heterogeneous, like in biological matrices, QM/MM models are instead expected to give a better description as they explicitly account for the structural details of the surrounding molecules. This advantage however generally implies a larger computational cost associated to QM/MM compared to continuum models, because predicted properties need to be ensembleaveraged, typically over a set of structures extracted from a molecular dynamics (MD) simulation. Another problem with QM/MM concerns non-equilibrium solvation effects arising from the inertial (slow) component of the polarization of the environment. These effects can be incorporated into continuum solvation models in a straightforward way by considering different dielectric constants for the fast and slow components of the reaction field, but they are not properly accounted for in QM/MM because the MM description is limited to point charges based on standard non-polarizable force fields.
To overcome this limitation, several groups have developed QM/MM methods based on polarizable force fields, [8][9][10][11][12][13][14][15][16][17][18] an idea already introduced in the seminal QM/MM paper by Warshel and Levitt. 19 Such approaches solve for mutual polarization effects among the solute and the solvent until self-consistency. In particular, some of the present authors have recently developed a polarizable QM/MM (QM/MMpol) method to describe excited states and energy transfer in condensed phase based on a linear response approach, which can be applied at the configuration-interaction of single excitations (CIS), 15,20 Zerner's semiempirical intermediate neglect of differential overlap (ZINDO), 21 or time-dependent density functional theory (TD-DFT) levels of theory. 15 Similar implementations for TD-DFT have also been described. [22][23] In this work, we extend the QM/MMPol methodology to the complete-activespace self-consistent-field (CASSCF) method. 24 This is one of the reference methods for excited states, specially for the study of potential energy surfaces, in combination with the complete-active-space second-order perturbative approach CASPT2 25 and its multistate variant MS-CASPT2. 26 In our implementation, the solvent potential (electrostatic plus polarization) is obtained at the CASSCF level as an external perturbation. A similar implementation has been described recently where the solvent response is calculated with respect to a specific electronic state. 27 In contrast, our method is based on stateaveraged CASSCF and allows for the treatment of several excited states. During the CASSCF procedure, the solvent polarization response is converged with respect to the state-averaged density of the solute. This is complemented with a simple yet efficient correction to estimate the state-specific interaction energy with the solvent. Once the polarization and the state-averaged density are converged, the state-specific polarization is recomputed for each transition to correct for the final energies. Finally, the stateaveraged polarization is used to calculate the CASPT2 and MS-CASPT2 energies.
We apply our novel method to model the excitation spectrum of the canonical cytosine keto N1H tautomer (see Figure 1), which is known to be the main one in aqueous solution. 28 This subject is of great interest in the context of the photophysics of the DNA nucleobases. 29 Cytosine, in particular, is a challenging test system because it presents several low-lying states of n→π * and π→π * character that behave differently upon solvation. In the gas phase, the vertical excitation energy of the lowest π→π * transition, estimated from electron energy loss spectroscopic measurements, is 4.65 eV or 266 nm. 30 Theoretically, most methods including CCSD(T), 31 MS-CASPT2, 32 CASPT2, 33 MR-CI, 34 DFT/MR-CI, 35 TD-DFT 36 and ADC(2) 37 estimate that the π→π * transition is the lowest vertical excitation (S 1 ) in the gas phase (4.57-5.14 eV), with the lowest n→π * transition appearing as S 2 (4.76-5.54 eV). Depending on the method, S 2 is assigned to an excitation coming from the oxygen or nitrogen lone pair. The vertical excitation spectrum in water has been calculated theoretically with various multireference methods using the QM/MM and continuum approaches, [38][39] and the calculated solvatochromic shifts are approximately 0.2 eV, for the lowest π→π * transition, and 0.6 -0.8 eV for the two lowest n→π * transitions. More recently, the spectrum of cytosine in water has been calculated taking the internal degrees of freedom of the molecule into account and using a continuum approach for the solvent, 40  The benchmark for our study in solution is the experimental spectrum at wave lengths higher than 190 nm. 41 This part of the spectrum has been fitted to four bands at 264, 233, 214 and 195 nm, which are labeled I -IV in order of increasing energy. The relative intensity of the bands follows the order IV > III > I > II. A complete simulation of the spectrum is out of the scope of the paper, since it would require sampling the internal degrees of freedom of cytosine and including the eventual contribution of other, minor tautomers. 40 43 In addition, the relative intensity of the four bands is qualitatively recovered when polarization effects are explicitly accounted for. For the n→π * states, the blue shifts are of the order of 0.8 -1.0 eV, and this results in vertical excitations of approximately 6 eV. Overall, the obtained results indicate that our method is a valuable tool to explore environment effects on multiple excited states.

MC-SCF polarizable QM/MM model
In this work we extend the polarizable QM/MM method previously developed for the treatment of excited states and electronic energy transfer at the CIS, ZINDO and TD-DFT levels to an MCSCF description of the excited states. 15,44 This method explicitly accounts for environment polarization effects based on the induced point dipole model. Thus, MM sites are assigned an atomic partial charge and an isotropic atomic polarizability. Our extension is based on the MCSCF code implemented in the Gaussian software and in some aspects parallels the implementation of the PCM method. 45 The Hamiltonian of the solute is modified by introducing a term accounting for the interaction between the QM and MM regions and a term describing the energy of the MM system. This leads to the following effective Hamiltonian: In Eq. (1), Ĥ 0 is the Hamiltonian of the isolated QM system, the operator Ĥ QM/MM introduces the interaction between the QM and the MM regions, and the operator Ĥ MM describes the internal energy of the MM region. The latter terms can be dissected into electrostatic and polarization terms: where Ĥ dipoles is solved iteratively. This could also be achieved using a matrix inversion approach, but it would involve a larger computational cost. 15 When an expansion of the wavefuntion over a finite basis set is introduced, the QM-MM interacting operator can be split in the following electrostatic and polarization oneand two-electron terms: where V λν and  E λν indicate the electrostatic potential and electric field integrals, q i and µ i the MM point charges and induced dipoles, and N and M the total number of charges and polarizable MM sites. In our implementation, induced dipoles are dissected into a nuclear (n) and an electronic (e) component, ) , which depend on the nuclei and MM charges and on the electrons, respectively.
The final QM/MM and MM contributions to the energy can be expressed as: where P represents the one-electron density matrix on the chosen basis set, U n ele describes the interaction between charges and nuclei, U q ele the self-energy of the charges, and U qn pol and U nn pol the interaction between nuclear induced dipoles and nuclei and charges, respectively.
In our QM/MMpol scheme we use the state-averaged (SA) density matrix because MCSCF calculations with state-optimized densities for excited states higher than S 1 suffer from root-flipping problems and do not converge. In order to estimate the energy corresponding to the proper state-specific polarization of the environment, at convergence we recompute the electron induced dipoles in terms of the density of each state (P state ) and we introduce an energy correction defined as the difference between the energies recomputed with the two sets of dipoles, according to Equations (10): In the following, we present two sets of QM/MMpol data, those obtained with stateaveraged densities and the corresponding dipoles, ΔE SA , and those including the statespecific correction, ΔE SS . In addition, we have also implemented the QM/MMpol method at the CASSCF level with state-optimized orbitals, thus avoiding the stateaverage procedure, and we indicate this data by ΔE SO . In this case, the solvent polarization is self-consistently solved from the state-optimized density of the state of interest. Although this approach is only practical for the lowest excited state because of root flipping, it allows us to compare the shifts obtained with the state-specific correction to those obtained using state-optimized orbitals for S 1 .
Finally, we recall that the state-specific correction is obtained at the CASSCF level, and then used to correct CASSCF, CASPT2 and MS-CASPT2 energies.

Molecular dynamics simulations
The structure of the cytosine keto N1H tautomer was initially optimized at the MP2 level with the 6-311G(d,p) basis set, and then solvated in a box containing 1482 water molecules (buffer zone of 15Å) using the Leap module of the Amber9 software. 46 Water and cytosine were described using the TIP3P parameters 47 and the ff99 force field, 48 respectively, and the cytosine geometry was kept frozen during the MD simulations. The system was initially thermalized from 100 to 298 K during 100 ps at

Results and discussion
In Table 1 Table 2, and a graphical summary of the solvent shifts on the different states in Figure 2. For completeness, the CASSCF and CASPT2 data are provided in the Supporting Information (Tables SI1 and SI2). In vacuum, the first eleven excited states involve six π→π * and five n→π * transitions, whereas in the presence of the PCM or QM/MM environment these correspond to seven π→π * and four n→π * excited states. We center our analysis on the four lowest π→π * states, for which the experimental data are available, and the four n→π * states, whose energy falls in the same range. The dominant orbital transitions for these states are included in Table 1, and the orbitals are displayed in Figure 3. The π→π * states follow different trends. The two lowest states, of π H →π * L and π H-1 →π * L character, are shifted up by approximately 0.1 eV in the QM/MMpol approximation, and by smaller amounts with PCM and QM/MM. These small blue shifts are due to the fact that the excited states have smaller dipole moments than the ground state (4.1 and 5.4 Debye against 6.2 Debye for the ground state, see Table SI3 in the Supporting Information), which leads to an increase of the excitation energy in the polar environment. In contrast, the π H →π * L+1 and π H-1 →π * L+1 states, where the dipole moments are similar in magnitude to the ground state (6.1 and 6.4 Debye), are red shifted by 0.6 and 0.2 eV, respectively. The different shifts experienced, in this case, probably arise because the density rearrangement upon excitation to the π H →π * L+1 state interacts more favorably with the water configuration surrounding cytosine compared to that occurring in the π H-1 →π * L+1 state. Together with the excitation energy shifts, changes in the oscillator strengths of the π→π * states are also observed upon solvation.
The order of intensity of the experimental bands is IV > III > I > II. The vacuum calculations give a different order in terms of oscillator strengths, where band III of the spectrum (π H →π * L+1 state) has the strongest absorption and band I (π H →π * L state) the weakest one. This order is maintained with the PCM and QM/MM approaches, and the experimental order of intensities is only reproduced with the QM/MMpol approach.  Table SI3). At the state-averaged CASSCF level, these changes are described for the active space orbitals. At the CASPT2 level, dynamic correlation allows for a polarization of the σ electron density that compensates the polarization of the active space orbitals, and the differences of the total density with respect to the ground state density are reduced. This results in a smaller solvatochromic shift. In fact, the dipole moment differences between the n→π * states and the ground state reflects this behavior. They decrease from 4.0 -4.6 Debye at the CASSCF level to 3.3 -4.1 Debye with CASPT2 (see Table SI3). The π H →π * L state behaves similarly, since its blue-shift is overestimated at the CASSCF level (0. tautomer is preferred over other forms by more than 5 kcal mol. 28 Finally, for some of the states dynamic correlation has a large contribution to the excitation energy, and in this case our approach of solving the polarization only at the CASSCF level may be insufficient to obtain the full solvent effect on the excitation.
To compare the different solvation approaches, in Table 2 we provide the mean error for each solvation method, obtained as the difference between the calculated excitations and the experimental band maxima, averaged over the four bands. The mean errors indicate that, for the π→π * states, the calculated excitations approach the experimental values as one goes from PCM to QM/MM and QM/MMpol. Regarding the band intensities, only the QM/MMpol approach gives the right order of bands.
However, the relative intensity is not recovered correctly. According to the analysis of Ref. [ 41 ], the most intense band IV, is more than six times as intense as band II, which is the weakest one. In contrast, the calculated oscillator strength for band IV is only twice as large as that of band II. Such a large discrepancy is probably due to several causes.
First, the contribution of dynamic correlation to the excitation energy of this state in vacuum is more than 1 eV, and in this case our approach may not account for the full polarization effect, as noted above. The n 1 →π L * transition, which is calculated at 6.82 eV and has an oscillator strength of 0.06, may also contribute to band IV, making it more intense than what is predicted by our calculations for the π H →π * L+1 transition.
Regarding the state-specific correction, at the MS-CASPT2 level it amounts in most cases to 15-20% of the overall QM/MMpol shift. The states with the largest correction are the three lowest n→π * states and the π H →π * L+1 state, which also have the largest shifts. In these cases, the correction follows the same trend as the QM/MMpol shift, i.e. blue-shifted states are further shifted to the blue and red-shifted ones to the red. Focusing on the four lowest π→π * states, the comparison of the corrected and uncorrected QM/MMpol excitations with the experimental bands gives virtually no improvement when the corrections are included (see the mean errors in Table 2).
However, the comparison between the vertical excitations and the experimental band maxima has its limitations, as discussed above. To gain further insight into the benefits of the proposed correction scheme, we have compared it with the results of "fully" state-specific calculations using state-optimized orbitals. The excitations of S 1 have been calculated with the QM/MMpol scheme, using state-optimized orbitals, for five snapshots (see Table 3). These excitations, labeled ΔE SO , are obtained from two independent calculations for S 1 and S 0 . The values of ΔE SO are compared with the stateaveraged QM/MMpol excitations without and with the state-specific correction (ΔE SA and ΔE SS , respectively).
The QM/MMpol ΔE SO values are more blue-shifted with respect to the vacuum excitation compared to ΔE SA state-averaged ones by ~0.13 eV. This difference is inherent to the two schemes. On one hand, the SA-scheme only gives a partial account of the solvent effect because the density of the states and the polarization of the solvent are not fully consistent. Beyond solvent effects, however, the state-averaging procedure has its own limitations, as illustrated by the similar deviation of ~0.14 eV found between ΔE SO and ΔE SA results in analogous calculations performed in vacuum. This suggests that the major source of error is the state-averaging procedure itself, rather than the solvation model and the state-specific correction. Indeed, the state-specific correction reduces the deviation in QM/MMpol results from ~0.13 eV to ~0.11 eV, a value smaller than that found in gas phase. The main point is that in all snapshots the state-specific correction brings the shifts closer to the state-optimized values, and the corrected values are improved compared to the uncorrected ones. It would be desirable to carry out a similar test for the remaining excited states, but this was not possible because state-optimized calculations for higher states did not converge. However, the fact that the proposed corrections follow the correct trend for S 1 suggests that the corrections for the other states, which are larger in magnitude, also bring the calculated shifts closer to the more accurate, state-optimized results.

Conclusions
We have implemented a polarizable QM/MM model at the CASSCF level of theory and applied it to the calculation of the vertical absorption spectrum of the canonical tautomer of cytosine in water. In our approach, the mutual polarization of the solute and the solvent is solved self-consistently at the CASSCF level, and the resulting set of charges and dipoles is used to calculate the vertical excitation energies at the CASPT2 and MS-CASPT2 levels of theory. Our scheme allows treating multiple excited states by converging the solvent polarization with respect to the state-averaged density of the solute. We also propose a state-specific correction where the solvent polarization is recomputed with the density of each state and the CASSCF energies  [38][39] speak against this possibility. However, it cannot be excluded that the n→π * state may be lower in energy in other regions of the potential energy surface and may be populated after internal conversion from S 1 . This is a question that cannot be settled here, since it needs a study of the potential energy surface that requires having the gradients for the QM/MMpol method. To this end we plan to extend the gradients recently implemented at the linear response level 54 to MCSCF. This implementation will also allow the determination of conical intersections in solution.
Beyond calculations in solvents, our scheme will also be useful for the study of excited states of biological systems, where environmental effects and specific interactions between the chromophore and the environment are important. The method will also be useful for the treatment of multichromophoric systems in solution, where charge transfer and exciton states are relevant. For the latter, inclusion of explicit MM polarization is particularly important in order to recover the solvent screening effects in electronic couplings among excited states that lead to exciton delocalization. 20 This will be the subject of further future work. Table 1. MS-CASPT2(14,10)/cc-pVTZ state-averaged transition energies (ΔE SA ) and oscillator strengths (f) computed for the four lowest n→π * and seven lowest π→π * transtions of cytosine in vacuum and in water as described by PCM, QM/MM and polarizable QM/MM calculations. For the latter, the state-specific corrections (ΔE corr ) as well as the state-specific corrected transition energies (ΔE SS =ΔE SA +ΔE corr ) are also shown.