Refined metadynamics through canonical sampling using time‐invariant bias potential: A study of polyalcohol dehydration in hot acidic solutions

We propose a canonical sampling method to refine metadynamics simulations a posteriori, where the hills obtained from metadynamics are used as a time‐invariant bias potential. In this way, the statistical error in the computed reaction barriers is reduced by an efficient sampling of the collective variable space at the free energy level of interest. This simple approach could be useful particularly when two or more free energy barriers are to be compared among chemical reactions in different or competing conditions. The method was then applied to study the acid dependence of polyalcohol dehydration reactions in high‐temperature aqueous solutions. It was found that the reaction proceeds consistently via an SN2 mechanism, whereby the free energy of protonation of the hydroxyl group created as an intermediate is affected significantly by the acidic species. Although demonstration is shown for a specific problem, the computational method suggested herein could be generally used for simulations of complex reactions in the condensed phase.

the system from returning to visited places in the configuration space.
For a proper description of chemical bond exchange of reactive processes, ab initio or semi-empirical MTD, that is, metadynamics combined with on-the-fly ab initio or semi-empirical electronic structure calculations, is usually required. For the computational efficiency of such simulations, the run could be terminated after reaching the product basin.
While this approach does not give access to any information regarding the free energy landscape of the product, the free energy barrier can be estimated from the difference in values between the saddle point and the reactant minimum. The escape process from the reactant basin allows one to identify the reaction mechanism via the change of CVs in going from the reactant minimum to the saddle point.
Theoretically, MTD is an exact methodology in the limit of infinitesimally low frequency and height of the deposited hills. However, in practice, these values are finite and, often, large enough for an efficient sampling of configuration space. This gives rise to the characteristic roughness of MTD free energy surfaces, which in turn might lead to a significant source of error when it comes to determining free energy barriers or differences. For this reason, several variants of MTD have been recently proposed with the goal of improving the accuracy of the obtained free energy surfaces. These variants include cleverly designed adaptive schemes in which the increments of the bias potential become smaller as the simulation goes on, [23,[26][27][28] reweighting techniques applied a posteriori to the MTD simulations, [29,[35][36][37][38][39][40][41] and methods in which the reweighting is applied on-the-fly and the result of such reweighting is used to define the adaptive bias potential. [33] Improvement of MTD has been also suggested using the combination with umbrella sampling, [42,43] adiabatic free energy dynamics, [44] and temperature accelerated molecular dynamics. [45] Despite all these valuable recent efforts, converging the free energy surface with an error lower than k B T is still computationally demanding in general.
In this paper, we propose a method of refining the free energy estimate using the approximate W(q) obtained from conventional MTD simulations. As shown in Figure 1 schematically, it is a postprocess for conventional MTD to upgrade the bias potential by canonical sampling. We start from the W(q) set from upon reaching the transition state or from just prior to reaching the transition state. This allows us to explore the CV space q at the free energy level where the reaction takes place. Next, we perform a biased sampling in the reactant basin where W(q) is regarded as a time-invariant bias potential.
Here we add an artificial wall potential that has high values at the entrance of the product basin and zero values in the reactant basin, so as not to affect the sampling. The sampling can be performed by any kind of equations of motion that generates the canonical ensemble, that is, thermostated MD or MTD in the absence of the hills deposit.
The resulting density distribution ρ W (q) is then used to upgrade W(q).
After some iterations of this procedure, the bias potential should converge to the negative value of the free energy within the region of q from the reactant basin to the transition state. Eventually, the sampling becomes almost uniform in this region from which one can judge the convergence. Unlike conventional MTD, the sampling is done always with a fixed W(q) at the last upgrade, so the unbiasing can be done without any approximate assumptions. Thus, the final error could be less than k B T after sufficient sampling, in principle.
In this study, we use the refined MTD to estimate the free energy barrier for dehydration of polyalcohols in high-temperature aqueous solutions. The dehydration reaction of polyalcohols is seen as an important step for the conversion of biomass into cyclic ethers. Hightemperature water has attracted attention from a viewpoint of green chemistry because of the advantages of being less toxic, cheaper, and easier to handle than organic solvents. [46][47][48] The reaction barriers in aqueous media are sensitive to temperature, pressure and acidity, which opens up the possibility of optimizing the reaction conditions.
In previous experiments, it was reported that the rate of polyalcohol dehydration can be accelerated significantly under acidic conditions. [49][50][51] Recently, the reaction rates of polyalcohol dehydration in high-temperature water and carbonated water has been studied systematically by an experimental group of Shirai, Yamaguchi, et al. [52][53][54][55][56][57] Accordingly our research group has been working on elucidating the mechanism of this reaction from a standpoint of computational chemistry. [58,59] From the analysis of dehydration of simple polyalcohols, that is, 2,5-hexanediol (HDO) and 1,2,5-pentanetriol (PTO), we proposed that the main reaction is dominated by a protonation-assisted S N 2 pathway. However, we have not yet proved the acidity dependence that has been observed experimentally. In this F I G U R E 1 A schematic figure for conventional MTD and its refinement. The bias potentials are depicted for those after (left) conventional MTD in blue, (middle) the first refinement in pink, and (right) the second refinement in orange study, our aim is to compute the free energy barriers for the dehydration reaction of 2,5-HDO in different acidic solutions in order to verify that the proposed S N 2 reaction mechanism shows such an acidity dependence.

| THEORY
Consider a situation where the negative value of the free energy surface, W(q), has been obtained approximately after a conventional MTD simulation, and we wish to refine the quality of that W(q). In this case, running an isothermal simulation in the presence of a timeinvariant bias potential W(q) would generate a canonical distribution where β = k B T, Q(r) is the set of CVs given as a function of atomic coordinates r, V (r) is the potential energy of the system, and is the canonical distribution in the absence of the bias potential. Now suppose that the biased simulation gave ρ W (q) as the probability distribution of finding the CVs at q = Q(r). Then, the Helmholtz free energy surface defined by A(q) ≡ − β −1 lnρ(q) can be estimated as where the unbiasing correction is, apart from a redundant constant, The relationship in Equation (3) becomes exactly equal when the amount of sampling is infinitely large.
In conventional MTD, the free energy is described as a sum of Gaussian functions (hills) to obtain a smooth surface. In the same way it would be convenient also to describe the free energy correction ΔW(q) as a sum of hills from the histogram of distributions, which will be shown below. In this way, the refinement of ΔW(q) could be done in an iterative manner by carrying out biased simulations with the upgraded bias potential, as shown by the schematic in Figure 2. Note here that the bias potential is shifted incrementally until convergence is attained.
This meets our purpose to estimate the free energy barrier since it allows sampling at the same free energy level for the reaction of interest. This is in contrast to conventional MTD where the free energy level is continuously elevated during the simulation. The more sampling we perform, the better quality we obtain for the free energy surface.
We now describe our implementation to compute the free energy correction described by N grid grid points assuming the form where h j is the height of the hill centered at the jth grid point in the CV space. For simplicity, it is assumed that the hills have a common width for all the grid points, and they are the elements of the diagonal matrix, To optimize the set of heights of the hills, is minimized. Then dE/dh = 0 leads to a linear equation where we have defined and In conventional MTD, W(q) is expressed as a sum of Gaussian hills, see Equation (12)  Equation (7) can be solved numerically, for example, using the LAPACK library [60] (DSYSV or DGESV). The grid points should cover the CV space where the histogram is non-zero, but they do not need to be regular. The value of jσj should be set greater than half of the grid spacing to make the W(q) surface smooth. For large grid sizes with high-dimensional CVs, evaluation of C ki matrix could be a computational bottleneck. In that case, the summation of C ki matrix can be done only for the grid points j where q j are close to both q k and q i , within a certain cutoff distance that is set much larger than jσj. For angular CVs, periodic boundary should be applied to Equations (8) and (9) to obtain the set of heights h.

| SIMULATION METHOD
The refined MTD has been applied to the dehydration reaction of molecule and 30 H 2 O molecules, either with or without acidic species (H 2 CO 3 , H 2 SO 4 , or HCl) added to it. It was confirmed previously that the system size effect on the free energy barrier is small for this system since the reaction takes place locally involving several water molecules in the vicinity of HDO. [58] The atoms were placed in a cubic box with a volume of (11.38 Å) 3 , which was preliminarily determined from classical molecular dynamics simulations in the NPT ensemble at 20 MPa and 523 K using the OPLS all atom force field. [61] Periodic boundary condition was applied to the system to represent the bulk aqueous solution. On-the-fly electronic structure calculations of the Born-Oppenheimer forces were done with a semi-empirical approach based on the self-consistent-charge density-functional tight-binding (SCC-DFTB) method [62][63][64] using the 3ob Slater-Koster parameter set. [65,66] The massive Nosé-Hoover chain (MNHC) thermostat technique [67][68][69][70] was used to control the system efficiently. The simulations were conducted using a combination of the PIMD [71,72] and DFTB+ [73,74] software packages.
Three CVs (N cv = 3, q 1 = d, q 2 = ϕ, q 3 = n) were chosen based on minimum energy path analysis of a microsolvated model of 2R,5R-HDO. [58] As shown in Figure 3, d is the difference in C 2 O 7 and C 2 O 8 distances, which represents the exchange of bond breaking and formation upon the ether ring formation, ϕ is the dihedral angle of O 7 C 2 C 3 C 4 atoms which characterizes the conformation of 2R,5R-HDO molecule, and n is the hydrogen coordination number of the O 7 atom which represents the protonation of the O 7 H hydroxyl group. Note that the C 2 O 7 bond is not included in n. We used the sum of rational functions for this where the bond length parameter is r b = 1.4 Å. Unlike our earlier works, [58,59] we did not employ artificial wall potentials in the large d regions to restrict the conformational sampling. Instead, the whole sampling, including the conformational change of 2R,5R-HDO, was undertaken using the refined MTD. Artificial wall potential at d < 0 was employed in the refinement stage to prevent sampling from the product region.
The pseudo-Hamiltonian of conventional MTD was given by where the first term is the Hamiltonian with respect to the N w copies of the molecular system, H sys (r (k) , p (k) ), where k is the index of the copy. Each of them represents the sum of kinetic energy of atoms and the Born-Oppenheimer potential calculated with the DFTB method, V (r (k) ). The second term is the kinetic energy of fictitious particles (N w i , and μ i are the momentum, the position and the mass of the walker, respectively, for the CV component i and the walker k. The third term is a harmonic potential that binds the position of the walker to the actual CV value, Q i (r (k) ), where κ i is the force constant. The fourth term is a history-dependent bias potential expressed F I G U R E 3 The structure of 2R,5R-HDO molecule in the extended conformation. The atoms related to the CVs, d, n and ϕ, are colored in yellow, green and blue, respectively as the sum of hills deposited to the places where all the walkers have visited until time t, that is, where h and σ i are the hill height and the hill width, respectively. The fifth term is the contribution from the MNHC thermostats. In our implementation, a Nosé-Hoover chain is coupled to each degree of freedom of the atoms and for each walker, and it is also coupled to each degree of freedom of the fictitious particle on the CV space for each walker. Thus, the total number of MNHC thermostats is (3 N First, conventional MTD simulations were undertaken using To speed up the calculation, the hill size was set initially to (h, σ 1 , σ 2 ,  [20] the fictitious mass was set as μ i = k B T τcv 4 | RESULTS

| Numerical test
Before proceeding to production simulations, we tested the performance of the refined MTD introduced in the previous section. Since we would like to use a simple diatomic model that imitates the C O dissociation involved in the polyalcohol dehydration reaction, we designed a Lennard-Jones potential function scaled by a cosine function, with ϵ LJ = 40.0 kcal/mol and σ LJ = 1.40 Å. As depicted in Figure 4A was computed according to Equation (3), which is shown in Figure 4C.
The MD calculation was continued with the upgraded bias potential W 1 (r) for an additional 1.5 ns. The free energy curve A(r) ≈ −W 2 (r), which was further refined by the histogram of ρ W1 r ð Þ , is shown in Figure 4D.

| Simulation results
These subspaces contains both the minimum and saddle points on the 3D free energy surface since the primary reactive mode is parallel to the CO bond-exchange variable, d. In Figures 5 and 6, A ϕ (d, n) and A n (d, ϕ) are shown where the origin is chosen to the value at the transition state (the minimum on the d = 0 lines). Thus, the darker colors on the free energy surface mean that the energy basin is deeper.
In Figure 5, the free energy surfaces A ϕ (d, n) and A n (d, ϕ) are shown for the case of 2R,5R-HDO in H 2 CO 3 solutions. The two figures, Figure 5A,B, are displayed to compare the quality of the free energy surfaces obtained before and after the refinement of conventional MTD. The smoothness of the free energy surfaces clearly reflects the improvement by the refinement. The refined MTD was able to remove the roughness with less spurious peaks and valleys in the surface. The difference of free energy profiles, Figure 5A,B, is shown in Figure 5C. In the unrefined MTD, the RMSE (MaxAE) with respect to negative free energy region of A ϕ (d, n) and A n (d, ϕ) were estimated to be 1.5 and 1.6 kcal/mol (3.8 and 7.3 kcal/mol), respectively. Figure 6A-C shows the results of the 2D subspaces, A ϕ (d, n) and A n (d, ϕ), in the HCl solution, in the H 2 SO 4 solution and in pure water, respectively, obtained from the refined MTD. Compared with our earlier works using conventional MTD, [58,59] the refined MTD has significantly improved the quality of free energy surface. selectivity of this reaction confirmed experimentally. [53] For the acidic solutions, the coordination number n ' 2.0 at the transition state implies that the reaction involves the protonation of hydroxyl group.
As to the reaction process just mentioned, the DFTB simulations were able to reproduce the results of our earlier Car-Parrinello MD (CPMD) simulation based on density functional theory (DFT). [58] However, a difference between DFT and DFTB was seen in the protonation mechanism of the hydroxyl group. The protonation occurs via a Grotthuss relay mechanism among water in the case of DFT, while the proton was mostly captured from the acid group in the case of DFTB. We believe this is an artifact of DFTB3 with the 3ob parameter set, which unfortunately could not also be removed by applying the D3 dispersion correction. [75] In spite of this, the DFTB simulations were able to capture qualitatively correct features of the free energy profiles as compared with the DFT simulations. This was the case in our previous work on the dehydration of PTO as well. [59] In Table 1  mechanism. This is in contrast to the S N 2 mechanism found in this computational study, in spite of the difference in temperature and pressure conditions (573 K, 20 MPa). The peculiarity of sulfuric acid is an issue to be addressed in the future.
The computed values were compared with the experimental free energy barrier, ΔA exptl . This was estimated from the measurement of the half life of 2R,5R-HDO, τ exptl ≈ 30 min, combined with the TST reaction rate, where h is the Planck constant. The factor 2 in the second equality arises from the fact that the reaction can proceed in two equivalent pathways leading to the same cis-2,5-DMTHF  (4) The ether ring of cis-2,5-DMTHF is formed. After this, the proton bonded to the ether will be released to water to complete the reaction F I G U R E 8 Dehydration process of 2R,5R-HDO in pure water: (1) HDO is its confined conformation. (2) The CO bond is elongated when the OH group is surrounded by water. (3) The OH − ion is eliminated by water. (4) The ether ring of cis-2,5-DMTHF is formed. After this, the proton bonded to the ether will be released to water to complete the reaction ΔA dftb values for pure water, 50.6 ± 0.8 kcal/mol, was much higher than that for the H 2 CO 3 solution. This is also consistent with the experimental observation that the reaction in the H 2 CO 3 solution is at least 50 times faster than that in pure water. [53] Why is the free energy barrier lower for strong acidic solutions?
Comparing the free energy profiles in Figures 5 and 6 Finally the ether ring is formed to become cis-2,5-DMTHF. The reaction processes of the aqueous HCl and H 2 SO 4 solutions were found to be in the S N 2 mechanism, and were similar to the case of the aqueous H 2 CO 3 solution.
The transition state configuration in pure water is different from that in acidic water. Specifically, the C OH bond is broken without protonation in pure water. This can be seen in Figure 6C that the coordination number is n ≈ 1.4 when d = 0, in contrast to the case of acidic solutions, n ≈ 2.0. Yet, the transition state in pure water leads to the same product, namely cis-2,5-DMTHF. These results demonstrate the possibility that polyalcohol dehydration in high-temperature water follows different pathways depending on the presence of excess protons. Thus, we speculate that the anomalous acidity dependence on the reaction kinetics as discussed in experiments [50,51] is ascribed to the multiple pathways competing with each other.

| CONCLUSION
In this paper, we have proposed a refinement of MTD simulations a posteriori to help reduce the error of free energy barriers of reactive processes. Making use of the hills obtained from the MTD as a timeinvariant bias potential, a canonical sampling is performed at the same free energy level as the reaction barriers. This method was used to study the dehydration reaction of HDO in high-temperature water with a focus on the acid dependence. It was found that the acid species affects significantly the free energy profile of the S N 2 reaction pathway, which can be traced back to the protonation free energy of hydroxyl group in hydrated environment. This result provides a theoretical foundation for the experimental observations that the rate of this reaction is accelerated by the acidity without losing the R/S stereoselectivity.
Finally, it is worth pointing out that the refined MTD was very helpful in this study. For the calibration of the acid dependence, it is essential to estimate accurately the free energy barriers that can differ from each other only by few k B T. This method works as a postprocessing, making a full use of the data from conventional MTD simulations that usually have taken a large computation effort to obtain already. Importantly, the method is simple but robust, and does not require any changes to existing software for conventional MTD simulations. Our method allows the refinement of the bias obtained from any other variants of MTD, such as the well-tempered metadynamics.

ACKNOWLEDGMENTS
The computations in this work were mostly done using the supercom-