Recovery of free energy branches in single molecule experiments

We present a method for determining the free energy of coexisting states from irreversible work measurements. Our approach is based on a fluctuation relation that is valid for dissipative transformations in partially equilibrated systems. To illustrate the validity and usefulness of the approach, we use optical tweezers to determine the free energy branches of the native and unfolded states of a two-state molecule as a function of the pulling control parameter. We determine, within 0.6 kT accuracy, the transition point where the free energies of the native and the unfolded states are equal.

We present a method for determining the free energy of coexisting states from irreversible work measurements. Our approach is based on a fluctuation relation that is valid for dissipative transformations in partially equilibrated systems. To illustrate the validity and usefulness of the approach, we use optical tweezers to determine the free energy branches of the native and unfolded states of a two-state molecule as a function of the pulling control parameter. We determine, within 0.6 kBT accuracy, the transition point where the free energies of the native and the unfolded states are equal. Recent developments in statistical physics [1] have provided new methods to extract equilibrium free energy differences in small systems from measurements of the mechanical work in irreversible processes (see [2,3] for reviews). In this regard, fluctuation relations [3] are generic identities that establish symmetry properties for the probability of exchanging a given amount of energy between the system and its environment along irreversible processes. If a system, initially in thermodynamic equilibrium, is strongly perturbed by fast varying a control parameter λ between two values λ 0 and λ 1 , then the system is driven out of equilibrium. The work exerted upon the system, averaged over the ensemble of all possible trajectories, reads W = λ1 λ0 (∂H/∂λ)dλ where H is the system Hamiltonian. According to the second law of thermodynamics, W is always greater than the free energy difference between the initial and final states, ∆G = G(λ 1 ) − G(λ 0 ). The Crooks fluctuation relation [4] extends the predictive power of the Second Law by establishing a symmetry relation for arbitrary functionals of a trajectory Γ measured along a nonequilibrium process (forward or F process) and its time reversed one (reverse or R process). In the forward process the system starts in equilibrium at λ 0 and λ is varied from λ 0 to λ 1 for a time t f according to an arbitrary protocol λ(t) (i.e., λ 1 = λ(t f )). In the reverse process the system starts in equilibrium at λ 1 and λ is varied from λ 1 to λ 0 following the time reversed scheme, given by λ(t f − t). In its most general form the Crooks fluctuation relation reads [4] where F stands for an arbitrary functional of the forward trajectories the system can take through phase space, β is the inverse of the thermal energy k B T where k B is the Boltzmann constant and T the temperature of the environment. In this relation,F is the time reversal of F, while the averages • F(R) are taken over the ensemble of all possible forward (reverse) trajectories. The particular case F = δ(W − W (Γ)) yields a relation between work distributions along the forward and reverse processes, This relation has been experimentally tested and used to extract free energy differences in single molecule experiments [5,6,7]. A thorough discussion on its validity domain can be found in [4] Fluctuation relation under partial equilibrium conditions. By considering only the trajectories that go from one specific subset of configurations to another one, Maragakis et al. [8] have derived another relation useful to extract free energy differences between subsets of states. In principle, the validity of Eq. (1) is restricted to initial conditions that are Gibbsian over the whole phase space S (what we might call global thermodynamic equilibrium). It is, however, possible to extend Eq. (1) to the case where the initial state is Gibbsian but restricted over a subset of configurations (what we might call partial thermodynamic equilibrium). A relation mathematically similar to Eq. (1) can be derived, but involving nonequilibrium processes that are in partial (rather than global) equilibrium. It is useful to rephrase here the derivation in such a way to emphasize the role played by partial equilibrium. As we will see this makes it possible to experimentally determine the free energy of coexisting states for values of λ such that the system is never globally equilibrated.
Let P eq λ,S (C) denote the partially equilibrated (i.e., Boltzmann-Gibbs) distribution for a given value of λ. Such distribution is restricted over a subset S of configurations C contained in S (i.e., C ∈ S ⊆ S). The case S = S corresponds to global equilibrium: P EQ λ (C) ≡ P eq λ,S (C). Partially equilibrated states satisfy P eq λ,S (C) = P EQ λ (C)χ S (C)Z λ,S /Z λ,S , where χ S is the characteristic function defined over the subset S (χ S (C) = 1 if C ∈ S and zero otherwise), and Z λ,S is the partition function restricted to the subset S , i.e., Z λ,S = C∈S exp(−βE λ (C)), with E λ (C) the energy function of the system for a given λ and C. Given a forward trajectory Γ, going from configuration C 0 when λ = λ 0 to C 1 for λ = λ 1 , let S 0 (S 1 ) be the subset of S over which the system is partially equilibrated at λ 0 (λ 1 ). Consider now the following transformation of the functional F in Eq. (1): F(Γ) → χ S0 (C 0 )F(Γ)χ S1 (C 1 ). Under previous conditions the following identity can be proved (see Supp. Mat.): where the average • is now restricted to forward (reverse) trajectories that start in partially equilibrated state S 0 (S 1 ) at λ 0 (λ 1 ) and end in S 1 (S 0 ) at ) stands for the probability to be in S 1 (S 0 ) at the end of the forward (reverse) process defined above, and ∆G S1,λ1 S0,λ0 = G S1 (λ 1 ) − G S0 (λ 0 ) is the free energy difference between partially equilibrated states S 0 and S 1 . In the following, we will drop the subscript (F, R), leaving the direction of the arrow to distinguish forward from reverse. Moreover, we will adopt the shorthand notation P S1 S0 ≡ p S0→S1 /p S0←S1 . If F = 1 we obtain a generalization of the Jarzynski equality, P S1 S0 exp[−β(W − ∆G S1,λ1 S0,λ0 )] = 1. Whereas for the particular case F = δ(W − W (Γ)), we get the relation P S1 which has been used in [8] in the case of global equilibrium initial conditions. Experimental test. Here we test the validity of Eq. (3) by performing single molecule experiments using optical tweezers. Let us consider an experiment where force is applied to the ends of a DNA hairpin that unfolds/refolds in a two-state manner. The conformation of the hairpin can be characterized by two states, the unfolded state (U ) and the native or folded state (N ) -see Fig. 1. The thermodynamic state of the molecule can be controlled by moving the position of the optical trap relative to a pipette (Fig. 2, upper panel). The relative position of the trap along the x-axis defines the control parameter in our experiments, λ = x. Depending on the value of x the molecule switches between the two states according to a rate that is a function of the instantaneous force applied to the molecule [9]. In a ramping protocol the value of x is changed at constant pulling speed from an initial value x 0 (where the molecule is always folded) to a final value x 1 (where the molecule is always unfolded) and the force f (measured by the optical trap) versus distance x curves recorded. By computing i) the fraction of forward trajectories (i.e., increasing x) that go from N (≡ S 0 ) x 0 Schematic picture of the free energy landscape in a two-state folder as a function of the reaction coordinate. The blue and red colors represent the subsets of configurations that define the N and U states respectively. (b) Depending on the value of the control parameter x, the shape of the free energy landscape is tilted toward one state (either N or U ). For each value of x, the Boltzmann-Gibbs equilibrium value of the force restricted to each state defines the native (blue) and unfolded (red) force-distance branches. (c) During a ramping protocol, x is changed at a constant pulling speed from x0 to x1 (from x1 to x0 in the reverse case) and the molecule can visit both states as indicated by the colored circles. In order to measure GU (x) (all free energies are computed with respect to GN (x0)) at a given value of x (gray area), only forward and reverse trajectories that are in N at x0 and in U at x have to be considered (which is true only for the reverse trajectory in the example shown in the picture). at x 0 to U (≡ S 1 ) at x and ii) the fraction of reverse trajectories (decreasing extension) that go from U at x to N at x 0 , we can determine P U N . Then by measuring the corresponding work values for each of these trajectories, we can use Eq. (3) to estimate the free energy of the unfolded branch G U (x) as a function of x. By repeating the same operation with N instead of U , the free energy of the folded branch G N (x) can be measured as well. Note that we adopt the convention of measuring all free energies with respect to the free energy G N (x 0 ) of the native state at x 0 . We are also able to compute the free energy difference between the two branches, We have pulled a 20 bps DNA hairpin using a miniaturized dual-beam laser optical tweezers apparatus [10]. Molecules have been pulled at two low pulling speeds (40 and 50 nm/s) and two fast pulling speeds (300 and 400 nm/s), corresponding to average loading rates ranging  between 2.6 and 26 pN/s, from x 0 = 0 to x 1 = 110.26 nm (for convenience we take the initial value of the relative distance trap-pipette equal to 0). A few representative force-distance curves are shown in Fig. 2 (inset of lower panel). We have then selected a value x =x = 61.75 nm, close to the expected coexistence value of x where both N and U states have the same free energy (see below). Such value of x is chosen in order to have good statistics for the evaluation of the unfolding and refolding work distributions. The system is out of equilibrium at the four pulling speeds. To extract the free energy of the unfolded branch G U (x), we have measured the work values x f dx along the unfolding and refolding trajectories, respectively, and then determined the distributions P N →U (W ) and P N ←U (−W ). In the main panel of Fig. 2 we show the work distributions obtained for a slow and fast pulling process. Note that the support of the unfolding work distributions is bounded by the maximum amount of work that can be exerted on a molecule between x 0 andx. This bound corresponds to the work of those unfolding trajectories that have never unfolded before reachingx.  As a direct test of the validity of Eq. (3), in the upper panel of Fig. 3 we plot the quantity log(P N →U (W )/P N ←U (−W )) + log P U N (x) against W (in k B T units). As expected, all data fall into straight lines of slope close to 1. The intersections of such lines with the W -axis provide an estimate of G U (x). Note that both fast and slow pulling speeds intercept the horizontal axis around the same value within 0.75 k B T of error. With this method, we estimate G U (x) = 185.9(5) k B T . Note that this free energy estimate, as all the others in this paper, refers to the whole system, comprising hairpin, handles, and trap.
A more accurate test of the validity of Eq. (3) and a better estimation [11] of the free energy G U (x) can be obtained through the Bennett acceptance ratio method [12]. In Bennett's method we define the following functions: is the optimal (minimal variance) estimate of G U (x). In Fig. 3 (lower panel) we plot the function z(u) for different pulling speeds. It is quite clear that the functions z(u) are approximately constant along the u axis and cross the line z = u around the same value G U (x) = 186.0(3). A distinctive aspect of Eq. (3) is the presence of the factor P U N . If such correction was not taken into account then the fluctuation relation would not be satisfied anymore. We have verified that if P U N is not included in the analysis, then the Bennett acceptance ratio method gives free energy estimates that depend on the pulling speed (see Supp. Mat.).
Free energy branches. After having verified that the fluctuation relation Eq. (3) holds and that it can be used to extract the free energy of the unfolded (G U (x)) and folded (G N (x), data not shown) branches, we have repeated the same procedure in a wide range of x values. The range of values of x is such that at least 8 trajectories go through N (or U ) (ensuring that we get a reasonable statistical significance). The results of the reconstruction of the free energy branches are shown in Fig. 4. The two free energy branches cross each other at a coexistence value x c at which the two states (N and U ) are equally probable. This is defined by ∆G U N (x c ) = 0. We get x c ≈ 61.28 nm, which is in good agreement with another estimate, x c = 62.0(7) nm, obtained interpreting experimental data according to a simple phenomenological model (see Supp. Mat.). In the insets of Fig. 4 we zoom the crossing region. It is interesting to recall again the importance of the aforementioned correction term (P U N ) to Eq. (3). If such term is not included in the analysis then the two reconstructed branches never cross (bottom right inset). This result is incompatible with the existence of the unfolding/refolding transition in the hairpin, showing that the factor P U N is key to measure free energy branches.
Equation (2) is valid in the very general situation of partially equilibrated initial states which, however, are arbitrarily far from global equilibrium. This makes the particular case Eq. (3) a very useful identity to recover the free energy of states that cannot be observed in conditions of thermodynamic global equilibrium. We have shown how it is possible to apply Eq. (3) to recover free energy differences of thermodynamic branches of folded and unfolded states in a two-state DNA hairpin. These methods can be further extended to the recovery of free energies of non-native states such as misfolded or intermediates states.
We are grateful to M. Palassini for a careful reading of the manuscript. We acknowledge financial support from grants FIS2007-61433, NAN2004-9348, SGR05-00688 (A.M, F.R). In this section, we provide a full derivation of the relation (2) that has been introduced in the main article: where the averages • S0→S1(S0←S1) are taken over all forward (reverse) trajectories that start in partially equilibrated subset of state S 0 (S 1 ) at λ 0 (λ 1 ) and end in S 1 (S 0 ) at λ 1 (λ 0 ). Because initial conditions of the forward and reverse processes are taken in partial equilibrium, Eq. (1) allows to compute the free energy differences (∆G S1,λ0 S0,λ0 = G S1 (λ 1 ) − G S0 (λ 0 )) between subsets of states in far from global equilibrium conditions (see main text).
In relation (1), F stands for an arbitrary functional of the forward trajectories the system can take through phase space,F is the time reversal of F, W is the mechanical work exerted upon the system along the forward trajectories, and p S0→S1 (p S0←S1 ) stands for the probability to be in S 1 (S 0 ) at the end of the forward (reverse) process. In what follows,• always refers to the time reversal of •.
In order to prove the validity of this relation, we adopt a framework similar to the one used for the derivation of the generalized Crooks relation [1] (Eq. (1) of the main text). In this regard, we recommend [1] for a thorough discussion about the generality of the derivation.
Thus, we consider a system S composed of a finite set of microscopic configurations C, which evolves according to a discrete time Markovian dynamics that verifies the following detailed balance equation: where p stands for the probability transition between two successive configurations in time, that is C t and C t+1 here. E λ (C) is the energy of S when the control parameter is equal to λ and the microscopic configuration is C. This models the evolution of a system that can exchange energy with a constant temperature heat bath. We define the forward trajectory is the corresponding probability for the trajectory T C to occur during the forward process. Then by using the detailed balance condition, we get: where we have usedĈ t = C t f −t . At this point, it is useful to remind the definition of the thermodynamic quantities in this Markovian process. During the forward process, for a given trajectory T C one can define the total heat exchanged with the thermal bath, Q(T C ), and the total work exerted upon the system, W (T C ), as [2]: As a result, eq.(3) can be written as P(T C )/P(TĈ) = exp(−βQ(T C )). Moreover by multiplying the equality (3) both at the denominator and at the numerator by the respective global equilibrium probability of the initial state, one gets 2 P EQ λ0 (C 0 )P(T C )/P EQ λ1 (Ĉ 0 )P(TĈ) = exp(βW (T C ))Z λ1,S /Z λ0,S where Z λ,S is the partition function of S. Next, a useful way to restrict this relation to partial equilibrium (see main text) is to introduce the characteristic function defined over any subset S of S: χ S (C) = 1 if C ∈ S and is equal to zero otherwise, so that P eq λ,S (C) = P EQ λ (C)χ S (C)Z λ,S /Z λ,S where P eq λ,S (C) is the Boltzmann weight of C restricted to the subset of state S ∈ S. Then, from relations (3) and (4), we get: Now, • S0→S1 is an average over the forward trajectories that start in partially equilibrated states S 0 at λ 0 and end in S 1 at λ 1 . Thus, for any functional F of these trajectories, we have: where the sum is over all the possible trajectories of the system. In this relation, we have defined p S0→S1 = T C P eq λ0,S0 (C 0 )P(T C )χ S1 (C t f ), that is the probability to be in S 1 at the end of the forward process, i.e. starting in partial equilibrium in S 0 . Applying the relation (6) to F exp[−β(W −∆G S1,λ1 S0,λ0 )] and using relation (5), we successively get: where from (7) to (8), in addition to the relation (5), we have used χ S1 (C t f ) = χ S1 (Ĉ 0 ). From (8) to (9), we have used exp[β∆G S1,λ1 S0,λ0 )] , the definition of the time reverse of F, that isF(TĈ) = F(T C ), and the fact that the set of reverse trajectories is the same as the set of forward trajectories. From (9) to (10), we have simply multiplied by the same quantity both the denominator and the numerator. Finally, from (10) to (11) we have used the equivalent of relation (6) for the reverse process plus the definition p S1←S0 = TĈ P eq λ1,S1 (Ĉ 1 )P(TĈ)χ S0 (Ĉ t f ) .
This ends the derivation of relation (1).
A consequence of the above relation is that, since the term P S1 S0 ≡ p S0→S1 /p S0←S1 is obviously influenced by the pulling speed, if we ignore it and erroneously apply the usual Crooks relation we are bound to find an incorrect "free energy" difference ∆G, the value of which depends on the pulling velocity. This dependence is illustrated in Fig. 1 and, even more clearly, in Fig. 2. In Fig. 1 the slopes of the fitting lines strongly deviate from 1 as the pulling speed changes, while Fig. 2 shows how the experimental lines intersect the line z = W at different values that are separated up to 2 k B T . More important, data for fast and low pulling speeds (orange and red lines versus blue and red lines, respectively) are clustered in two distinct regions in the figure. This shows that the free energy difference ∆G systematically changes with the pulling speed.
B. Note on the data analysis procedure The work probability density functions (PDF) represented in Fig. 2(b) of the Letter have been calculated by following a histogram-free recipe recently proposed by Berg and Harris [3]. The method, in a nutshell, consists of approximating the empirical cumulative distribution function (ECDF) with a differentiable function that incorporates a suitable number of terms of the Fourier expansion of the data. The Kolmogorov test is used to decide when the difference between the ECDF and its analytical approximation may be ascribed to chance. The PDF is then evaluated simply by differentiation, while error bars can be estimated using a jackknife method [4].
This procedure is especially convenient for our data, because the statistics of the forward trajectories such that the hairpin is unfolded at x = x c can get so sparse that the traditional, histogram-based method becomes excessively dependent on the number and origin of the bins. A small downside, on the other hand, is that Berg and Harris' technique (like the Kolmogorov test it is based upon) is more sensitive in the central region of the distribution than about the tails. Since we use the analytic values of the PDF to compute the probability ratios used for Fig. 3(a) in the Letter (or Fig. 1 in the appendices), the slight misrepresentation of the extrema of the unfolding work distributions results in the small curvature that can be noticed in those figures.

III. DERIVATION OF xc FROM A SIMPLIFIED MODEL
Let us consider a pulling experiment where a two-state-like molecule is driven from an initial configuration characterized by the control parameter x 0 x c to some final x 1 x c , where x c is the value of x for which the states N and U have same free energy. In this section, a simplified version of the theoretical model described in [5] is introduced, that allows the estimation of x c in terms of quantities easily measured in single molecule experiments.
Let us assume that the Thermodynamic Force-Distance Curve (TFDC, i.e., the force-distance curve that we would observe if we were performing the experiment in equilibrium conditions) is shaped like in Fig. 3, that is, it jumps from the line f (x) = f 0 + k eff x to the line f (x) = f 0 + k eff x − ∆f at some coexistence point x = x c . This is not the most general case, but is a useful approximation for the experimental setting we worked with.
The coordinates of the points in From the experimental data, it is easy to measure directly f 0 , k eff , and ∆f . In the case of our experiment on the where the uncertainty is estimated as standard error. Moreover, performing many pulling experiments and applying the Crooks relation [1] to the resulting forward and reverse work distributions, we are able to determine the reversible work W rev , that corresponds to the entire area below the TFDC from x = x 0 to x = x 1 . From the knowledge of the three parameters (14) and of the reversible work W rev = 353.3(4) k B T , it is possible to determine x c as we explain below.
First, we introduce the notion of coexistence force f c , defined as the force at the midpoint between C and D. Then, we baptize B and E the points where the coexistence force is reached on the folded and the unfolded line, respectively. In other words, where The area (red in Fig. 3) below the TFDC between B and E is what in [5] is called W rip . The blue area is (f 0 + k eff x 0 )(x 1 − x 0 − ∆x), while the green area is k eff (x 1 − x 0 − ∆x) 2 /2, hence On the other hand, it is also true that Combining Eqs. (15) and (16), and solving with respect to x c , we find that is (using our experimental conditions T = 24 • C, x 0 = 0, and x 1 = 110.26 nm), x c = 62.0(7) nm .
This may be compared to our experimental curves for the free energy, which cross each other around x ≈ 61.28 nm. As a consistency check of the model we are using, we can notice that the parameter ∆x that represents the length of the unfolded hairpin (actually, the difference in length between the folded and unfolded states) is 19.0(6) nm, while the coexistence force is 13.4(3) pN, both values being consistent with hopping experiments carried out with the same molecule (unpublished data).