Ligand Binding Rate Constants in Heme Proteins Using Markov State Models and Molecular Dynamics Simulations

Computer simulation studies of the molecular basis for ligand migration in proteins allow the description and quantification of the key events implicated in this process as, such as the transition between docking sites, displacements of existing ligands and solvent molecules, and open/closure of specific “gates”, among other factors. In heme proteins, especially in globins, these phenomena are related to the regulation of protein function, since ligand migration from the solvent to the active site preludes ligand binding to the iron in the distal cavity, which in turn triggers the different globin functions. In this work, a combination of molecular dynamics simulations with a Markov-state model of ligand migration is used to the study the migration of O 2 and · NO in two truncated hemoglobins of Mycobacterium tuberculosis (truncated hemoglobin N -Mt-TrHbN- and O -Mt-TrHbO). The results indicate that the proposed model provides trends in kinetic association constants in agreement with experimental data. In particular, for Mt-TrHbN, we show that the difference in the association constant in the oxy and deoxy states relies mainly in the displacement of water molecules anchored in the distal cavity by O 2 in the deoxy form, whereas the conformational transition of PheE15 between open and closed states plays a minor role. On the other hand, the results also show the relevant effect played by easily diffusive tunnels, as the ones present in Mt-TrHbN, compared to the more impeded passage in Mt-TrHbO, which contributes to justify the different . NO dioxygenation rates in these proteins. Altogether, the results in this work provide a valuable approach to study ligand migration in globins using molecular dynamics simulations and Markov-state model analysis.


Introduction
Ligand migration is a key process in protein function since it involves the transition of the ligand from the protein exterior to the active site, which in turn can trigger the molecular mechanism associated with the specific protein function 1 . In heme proteins, small ligands as CO, NO, O 2 and H 2 S can migrate from the solvent to the distal cavity, where they bind to Fe(II) and/or Fe(III) in the heme group, leading to ligand transport and storage, sensing properties, and enzymatic catalysis, among other functions. Globins are a widely expressed family of heme proteins, which share a 3-over-3 helical sandwich fold. Structurally closely related, truncated hemoglobins (TrHb) are smaller members of the globin family, they being characterized by a 2-over-2-helical motif 2 .
The characterization of ligand migration through computer simulations usually requires a vast exploration of the free energy surface of the system, which exceeds the possibilities of standard atomistic molecular dynamics (MD) simulations to attain a complete sampling and derive an accurate kinetic and thermodynamic description. Accordingly, one has to resort to enhanced sampling techniques to overcome these difficulties. As an example, free energy profiles for ligand migration through the tunnels present in TrHbs have been obtained by resorting to biased MD techniques, such as the coupling between Multiple Steered MD (MSMD) methods 3-6 and Jarzynski's equality 7 . In this framework, the ligand is forced to migrate along the tunnel using a variable harmonic restraint, which implies the definition of a geometrical coordinate that guides the ligand passage through the tunnel 8 . Although this procedure is computationally efficient, it also presents several drawbacks, especially because the tunnel topology cannot be generally described using a simple geometrical coordinate. Furthermore, the results may be affected by the pulling velocity and the total amount of trajectories, among other factors. Finally, estimating the free energy from the exponential average of the irreversible pulling work along the reaction coordinate for a finite number of trajectories can lead to notable errors 9 .
There have also been attempts to tackle these issues using other techniques, such as the implicit ligand sampling methodology 10 or combining free MD simulations with a numerical solution of the Smoluchowski equation 11 . Metadynamics 12 and weighted ensamble 13 simulations have also been used to characterize the passage of ligand through channels. Alternatively, a promising strategy consists of the use of Markov-State Models (MSM), which have been widely applied to protein folding processes 14,15 .
These methods discretize the conformational space into distinct "sites", and characterize the transition probabilities from one site to another. A clear advantage is the possibility to characterize those transition rates or probabilities independently by performing multiple short MD trajectories, ensuring the sampling of transitions between contiguous sites.
In the field of ligand migration, a continuous time-discrete space MSM (ctdsMSM) was used by Blumberger and coworkers [16][17][18][19] , to study the migration of CO, O 2 and H 2 in Fe-Fe and Fe-Ni hydrogenases, and then obtain second-order kinetic rates with a phenomenological kinetic model. This approach is based on the construction of rate matrices, by estimating the number of times the system transitions between two given states and the average lifetime of the system in each site. It is chemically intuitive, considering that the system can transition in a unimolecular fashion from one site to another and transitions depend on the lifetime of a given state. They also explored the possibility of calculating the fluxes through surfaces that separate the initial and final states. To our experience, the need of estimating the average lifetime of the ligand in a certain state turned out to be not easy to be calculated in a reliable way. Furthermore, the uncertainty in cluster definitions may introduce spurious transitions, making it difficult to estimate the lifetimes of the ligand at the different sites.
On the other hand, Meuwly's group studied ligand binding to Myoglobin 20 and Mycobacterium tuberculosis (Mt) trHbN 21-23 using a discrete time-discrete space MSM (dtdsMSM) approach. This strategy avoids the calculation of the lifetime of the system in each state. Nevertheless, it is necessary to introduce and choose a temporal parameter (τ) to ensure that the model satisfies the Markov assumption.
Meuwly et al. efficiently characterized the transition rates between sites and determined a characteristic time for the decay of the population in each of them. This information is of great relevance, since it allows obtaining a detailed characterization of individual sites.
Here, using that transition network information, we propose to extract additional information by evaluating the collective evolution of the sites as a function of time, starting from an initial proposed population, so that a phenomenological model can be fitted. To this end, we apply a novel combination of MD simulations, MSM and empirical kinetic considerations to characterize the migration of ligands in Mt-TrHbs ( Figure 1). Mt is the pathogen that causes tuberculosis, a disease that produces over three millions deaths each year and affects two billion people around the world 24 . It expresses two members of the TrHb family 25,26 , named O (Mt-TrHbO) and N (Mt-TrHbN). This latter protein has a crucial role in ·NO detoxyfication (NOD), a process relevant to warrant the survival of the bacillus 27 . Active in the ferrous form, Mt-TrHbN converts O 2 and ·NO into the harmless species NO 3 -, avoiding the attack from the immune system of the host. To do so, ligands diffuse from the protein exterior to the distal cavity through a complex network of 6 tunnels ( Figure 2) which has been extensively characterized both experimentally 28,29 and computationally 3,21,30,31 . Briefly, three main tunnels were identified in Mt-TrHbN: the long tunnel (LT), located between the B and the E helixes, the short or G8 tunnel (STG8), and the E7-gate tunnel, which is found in many globins, Finally, an additional channel named EH tunnel was also described 27 . The biological function of Mt-TrHbO remains unclear 32 . It oxidizes nitric oxide, decomposes hydrogen peroxide 33 and has been proposed to act as a dioxygen carrier from the cytoplasm towards the cell membrane 34,35 under hypoxia conditions. Its NOD activity is less efficient compared to other heme proteins in the same organism. Ligand migration in Mt-TrHbO is mainly carried out through the LT. The short channel is blocked by the presence of a tryptophan in the G8 position 36 , and the E7 entrance is sealed by electrostatic interactions between positively charged residues and heme propionates 37 .
There is a considerable amount of biophysical kinetic data, which is briefly summarized in Table 1. The difference in rate constants for O 2 uptake in deoxy Mt-TrHbN (k on ) and ·NO dioxygenation in oxy Mt-TrHbN (k ox ) is remarkable, and this selectivity has been a subject of discussion for almost twenty years 3,38 . In this study, for the sake of clarity, we will refer to k ox ·NO as k on , although it is worth noting that it does not correspond to ·NO binding to the deoxy form. Since ligand migration has been previously shown to be the rate limiting step for ·NO oxidation, the entrance kinetic rate is expected to be similar to the overall k ox . Many distal mutants have been expressed and characterized in order to provide a microscopic interpretation 38  Lately, the lower k on for O 2 relative to NO oxidation was also attributed to the presence of strongly retained water molecules in the vicinity of the ferrous unligated heme iron 38 . Although it is known that water molecules do not bind to the ferrous heme, polar distal residues can act as hydrogen bond donors and therefore interact with water molecules in the distal pocket. In Mt-TrHbN, key distal amino acids are TyrB10 and GlnE11 ( Figure 2). It is important to note that while the first ligand (O 2 ) needs to displace the water molecule to bind to the heme, the second ligand just needs to approach the bound O 2 for the reaction to proceed, avoiding the need to overcome the free energy barrier for water release. The effect of displacing water molecules in the distal cavity on the kinetic association constants was analyzed globally in the truncated hemoglobin family, and was proved to be a key factor in determining ligand migration 39 .
Let us note that another mechanism for ·NO dioxygenation where nitric oxide could bind first to the heme, especially under high ·NO concentrations 40 , may be considered. In this case the rate limiting step was proposed to be the ·NO substitution by O 2 in the distal coordination position. Therefore, it is important to remark that the physiologically relevant mechanism will depend on the intracellular concentrations of both gaseous ligands.
In this work, we use a computational scheme outlined above, and summarized in Figure 1, to address a comprehensive analysis of ligand migration in Mt-TrHbN and Mt-TrHbO. The aim is to describe qualitatively and quantitatively the process of ligand migration in these proteins and derive kinetic rate constants from nonguided MD trajectories. For Mt-TrHbN, we aim to obtain a quantitative description of ligand migration towards the distal cavity in the oxy and deoxy states, and disclose the molecular basis for the observed difference in ligand migration for O 2 and ·NO in deoxy and oxy ferrous forms.
Specifically, attention will be paid to i) the existence and relevance of the PheE15 gate, and ii) the presence of water molecules in the distal cavity. Finally, we also compare the ·NO oxidation rate between Mt-TrHbN and Mt-TrHbO where, as observed in the constant rate values shown in Table 1, the protein matrix is likely to hinder ·NO migration towards the distal cavity.

Theoretical Methods
In order to estimate the rate constants in TrHbs, we applied a protocol ( were modeled using three-site molecules [46][47][48] , to adequately describe the quadrupole moment of the ligands. The initial structures for Mt-TrHbN and Mt-TrHbO were taken from the PDB (entries 1IDR 49 and 1NGK 50 , respectively). Ionizable side chains were assigned to the predominant form at physiological pH, and histidine protonation state was determined according to the hydrogen bond interactions formed with surrounding residues. The proximal histidine was protonated in the δN, so that it binds to the heme iron through its εN.
For both oxy and deoxy Mt-TrHbN, and oxy Mt-TrHbO systems, a truncated octahedron solvation box was constructed with approximately 9000 TIP3P water molecules. We used a 2 fs time step for the MD integrator and Berendsen thermostat and barostat 51 . A short energy minimization was performed to avoid steric clashes, followed by an 800 ps heating to a final temperature of 300K. The system was simulated for 500 ps using the NPT ensemble in order to achieve a suitable density for the system.  53 Let us note that in this latter work, the authors indicate that the location and population of the clusters are not significantly affected by the clustering methodology. However, some differences in the kinetic behavior are observed especially when using a more complex method such as the locally scaled diffusion map method.

Construction of the transition matrix and intersite kinetic rate estimation. Each snapshot in the short
trajectories was assigned to a previously defined cluster, which are numbered 1 to N. Thus, every trajectory is encoded as the succession of the clusters the system has visited. Let τ be a temporal parameter, known as lag time. A transition from cluster i to j is considered to occur in a certain frame if the given frame belongs to the cluster i but τ frames later it was labeled as j. The method of the sliding window 54 was employed to construct the count matrix. This method evaluates every frame of the trajectory, retrieving non-independent information on the transitions, which are statistically correlated. The other possible scheme is the "independent count" scheme, where the frames inside the interval [x, x+τ] are not used. We preferred the first scheme, as it allows us to extract as much information as possible from our trajectories. However, the "independent count" scheme was applied in order to estimate uncertainties for the calculated rate constants, as explained in the SI.
The transition matrix is constructed as an N dimensional squared matrix in the following way.
The lag time must be selected in order to capture the characteristic time of the physical process of interest, and was done as suggested by Bowman, Pande and Noé 54 . The choice of a too small value for τ leads to non-Markovian behavior of the resulting MSM. We tested different values of the lag time and observed that when τ < 300 frames, the Markov property was not ensured in the model. We considered that a value of τ=400 frames, which is equivalent to 40 ps in time units, was adequate for this study. As  This model fitting procedure yielded a pseudo first-order kinetic rate (values can be found in Table S2 from the SI). ( Since our simulation scheme did not include the estimation of the kinetic rates from the bulk solvent into the channel entrances, the calculation of the absolute second-order kinetic rates is not possible.
Nevertheless, we present ratios between those fitted pseudo first-order kinetic rates, which reproduce the experimental trends. Although we intended to minimize possible methodological restrictions to the determination of rate constants, the obtained values are affected by different intrinsic factors, such as the definition of the different states, and the temporal parameter chosen to build the transition matrix, among others. In this context, although only estimations of rate constants can be obtained, this approach is valid to compare the ligand migration through the protein assuming that the bimolecular contribution to the kinetic rate is analogous for the TrHbs in their oxy and deoxy forms.
Tunnel entrance solvent accessible surface (SASA) and cavity volume estimation. Since we do not model the bimolecular diffusional step (Scheme 1), we try to determine how encounter pair formation between a ligand and different proteins will differ by means of analyzing the surface of the tunnel entrances. Solvent accessible area (SASA) was calculated using CPPTRAJ 52 program from the Amber Package, employing default probe radius parameters. To evaluate the tunnel entrances, we calculated the SASA for those protein residues that were more exposed to the solvent. In the case of Mt-TrHbN, the long channel entrance was estimated using Ile19 Ala24 and Ile25. The short channel exposed area was calculated using Ala95, Leu116 and Ile119. In the case of Mt-TrHbO, Met90, Leu50, Leu114 and Leu110 were considered to act as the entrance of the long channel. The volume of internal cavities was estimated using Caver 2.0 55 , using an spherical 0.87 Å probe. The cavities calculated that corresponded to any of the described tunnels were summed.
Unimolecular water self-exchange process. As stated before, it is known that water molecules can be retained in the proximity of the ferrous iron by polar distal residues in this family of proteins. To examine this effect, we generated 1 µs MD trajectory of deoxy Mt-TrHbN and evaluated the presence of water molecules in the distal cavity. It was found that a water molecule is constantly anchored to TyrB10 and GlnE11 residues, the distance from the water oxygen atom being in agreement with the formation of hydrogen bond interactions. The residence time of those molecules was measured and an average residence time was calculated.

Results and discussion
Ligand migration in Mt-TrHbs is an interesting test system to calibrate the methodological strategy  (Table 1). PheE15 may exert a role in this process by regulating the accessibility through the LT, changing its side chain conformation from "closed" to "open" states, and hence allowing a faster migration of the second ligand ( . NO) after O 2 binding. Indeed, Figure 4 shows that the PheE15 side chain modifies the ligand positions sampled in MD simulations. With this strategy, the aim is to evaluate the ligand migration process to and from the distal site through the protein matrix in the two distinct PheE15 conformational states. As can be observed in Figure S2  In the case of ·NO entrance to the oxy protein, k in were almost equal in the two PheE15 conformations (Table S1) (Table S2). These results indicate that the PheE15 conformation has a mild effect on the ligand migration, which does not suffice to explain the differences in the experimental rates.
An alternative factor for ligand differentiation is the displacement of the water molecule anchored close to the sixth coordination position (see Figure 5), already proposed by Guertin et al 35 , a process that should affect the O 2 binding in the deoxy state, since the approach of O 2 to the heme Fe(II) is hindered by the water molecule in the distal cavity. In this scenario, it is possible to propose a unimolecular reaction step that involves water self-exchange (or at least, a release from the hydrogen bond anchoring), a rare event that would act as a rate limiting step in the mechanism proposed in Scheme 2.

Scheme 2.
Kinetic scheme proposed for oxygen binding to deoxy Mt-TrHbN. In this case, the second step is proposed to be rate limiting. Under the assumption that this first-order process is independent of the presence of other molecules in the distal cavity, the associated rate can be estimated as the inverse of the mean lifetime of an anchored water molecule in the distal cavity, as follows where T i is the lifetime of the ith water molecule that has occupied the distal position ( Figure 5).
This rate was estimated from the analysis of the residence time for water molecules found in the distal cavity in 1µs unrestrained MD simulation of the deoxy Mt-TrHbN system. Approximately 25 water exchange events were observed along the trajectory, and the average lifetime was estimated to be about 30 ns.
Since this elementary step of the mechanism is rate limiting, we propose that pseudo preequilibrium conditions are established and the k on could be estimated as where the equilibrium constant K mig was estimated from the ratio between k in and k out , obtained by adjusting the distal site population in an equilibrium relaxation scheme (see SI for more details).
For ·NO oxidation in oxy Mt-TrHbN, the migration through the protein matrix is a kinetically relevant step of the mechanism (Scheme 3). Since O 2 is already bound to the heme iron, there is no need to displace water molecules. Furthermore, the chemical reaction has been reported to occur in a barrierless process 50,51 Considering the reaction mechanism shown in Scheme 3, the k on estimated for NO corresponds to the calculated k in value.
Under these considerations, the ratio k on ( Table S2). The experimental value corresponds to the ratio between the rates determined for nitric oxide oxidation and oxygen binding.
Overall, the results allow us to estimate in a relative way the influence exerted by the i) PheE15 conformation, and ii) the distal water displacement on the ligand migration in Mt-TrHbN. The analysis of the two molecular assumptions in a phenomenological kinetic scheme allowed us to envision how distal water displacement accounts for most of the ligand selectivity, whereas PheE15 gate conformation provides a minor contribution.

NO dioxygenation in Mt-TrHbO and Mt-TrHbN.
To further validate the predictive power of the applied methodology, and the ability to capture the effect of the protein matrix in ligand migration, we analyzed the case of ·NO migration in oxygenated Mt-TrHbO.
In this case, oxygen k on [35] (1.1 10 5 M -1 s -1 ) can be explained as well in terms of distal water anchoring, mimicking the case of the Mt-TrHbN. The NO dioxygenation rate (6.105 M -1 s -1 ), [35] however, is considerably lower than the almost diffusionally limited rate observed in the Mt-TrHbN case. In our model, the kon for NO can be directly estimated from the obtained kin value, as in the case of TrHbN, since the chemical activation of dioxygen to react with nitric oxide is considered to occur with a low activation barrier, and hence the kinetic efficiency loss for ·NO oxidation is mainly due to protein matrix effects. The calculated value for the pseudo-first order rate constant (kon NO) resulted (4±2).10 6 s -1 for TrHbO, compared to (1.5±0.7) 10 8 s -1 determined for Mt-TrHbN (Table S1).
Although there is qualitative agreement with the experimental data, the calculated ratio is smaller than the experimental one determined from the available rate constants (Table 1). This difference may be attributed, at least in part, to the distinctive topological features of the migration tunnels in the two proteins. First, inspection of the positions occupied by . NO in the protein interior throughout the MD simulations, as noted in Figure 6, shows that ligand uptake mainly involves the migration through the long channel 5,38 . Accordingly, the total volume of the internal cavities visited by the ligand in the two proteins is larger in Mt-TrHbN (Table S3)  These two effects, i.e. the larger probability of the ligand to find the channel entrance from the bulk solvent, and the ability to concentrate ligand molecules inside the channel, are factors not directly accounted for in the MSM scheme. However, they should influence the translational entropy loss due to ligand association to the protein, 58,59 which in turn should facilitate the ligand migration in Mt-TrHbN, explaining the difference between the k on (·NO) values in the two proteins.. These results show the effect that may be played by easily diffusive tunnels, as the ones present in Mt-TrHbN, compared to the more impeded passage seen in Mt-TrHbO.

Conclusions
Our results attribute the lower rate for O 2 binding to Mt-TrHbN to the displacement of the water molecule anchored by GlnE10 and TyrB11 distal residues, a step required to enable the binding of O 2 to the heme iron. The release of the solvent molecule is the rate limiting step, whereas the PheE15 conformation has a mild effect on the ligand migration. Regarding NO migration, our calculations showed how the protein matrix effects are more important in Mt-TrHbO, where the side chains present in the channels provide a frustrated pathway to reach the distal cavity from the tunnel entrances. Other topological effects that differ between the two proteins may also contribute to the different rates for . NO migration: i) the different solvent exposure of the tunnel entrance, which is almost 10-fold smaller in Mt-TrHbO relative to trHbN, and ii) a preconcentration process related to the larger volume of the cavity entrance in Mt-

TrHbN.
Overall, the combined approach used in thus study, which combines MSM in conjunction with the analysis of simplified systems for the characterization of specific mechanism steps, and a phenomenological kinetic model, allowed us to interpret the differences in the kinetic rate constant determined experimentally for two representative truncated hemoglobins. This computational framework allowed us to discriminate global protein matrix effects from temporally uncoupled molecular rare events, such as water self-exchange from a position with favorable interactions, and the relative importance of the PheE15 gate in ligand migration in Mt-TrHbN.

Supporting Information Description
Additional considerations on the phenomenological models for the estimation of kinetic reaction rates are available. We also present the open and closed conformation fractions obtained from MD, volume of the internal cavities and the fitted pseudo first order kinetic rates.