Force-induced misfolding in RNA

RNA folding is a kinetic process governed by the competition of a large number of structures stabilized by the transient formation of base pairs that may induce complex folding pathways and the formation of misfolded structures. Despite of its importance in modern biophysics, the current understanding of RNA folding kinetics is limited by the complex interplay between the weak base-pair interactions that stabilize the native structure and the disordering effect of thermal forces. The possibility of mechanically pulling individual molecules offers a new perspective to understand the folding of nucleic acids. Here we investigate the folding and misfolding mechanism in RNA secondary structures pulled by mechanical forces. We introduce a model based on the identification of the minimal set of structures that reproduce the patterns of force-extension curves obtained in single molecule experiments. The model requires only two fitting parameters: the attempt frequency at the level of individual base pairs and a parameter associated to a free energy correction that accounts for the configurational entropy of an exponentially large number of neglected secondary structures. We apply the model to interpret results recently obtained in pulling experiments in the three-helix junction S15 RNA molecule (RNAS15). We show that RNAS15 undergoes force-induced misfolding where force favors the formation of a stable non-native hairpin. The model reproduces the pattern of unfolding and refolding force-extension curves, the distribution of breakage forces and the misfolding probability obtained in the experiments.


I. INTRODUCTION
Like proteins, RNAs have enzymatic, regulatory, and structural functions that are crucial for the correct operation of cells ͓1,2͔. RNA molecules are found in single-stranded form and are designed to fold into specific three-dimensional conformations, called native states. RNA folding is a kinetic process mainly governed by the interactions between complementary bases, which can lead to the formation of both native and non-native domains. As a result, folding into states that are structurally different from the native state, usually referred to as misfolding, can occur ͓3͔. Misfolded RNAs are not functional and can be harmful to organisms ͓4͔, just as misfolded proteins ͑e.g., prions͒ that are involved in several diseases ͓5͔. Folding of biomolecules, such as RNA molecules and proteins, is therefore a subject of great importance in modern biophysics. Under which conditions is misfolding prone to occur? What are the structural elements that prevent folding into the native structure? Is it possible to control misfolding by designing specific molecular sequences? To answer such questions modeling of biomolecular folding is of great help. The competition between a very large number of structures, which may lead to misfolding, makes modeling of folding a difficult and challenging prob-lem in biological physics where disorder and frustration play a crucial role ͓6,7͔. RNA mostly folds in a hierarchical fashion dominated by the formation of secondary structures ͓8-10,14,15͔. In contrast to proteins where native state prediction is very difficult, it is possible to infer the correct secondary structure of RNA molecules from computer calculations ͑Mfold͒. This makes RNA folding a more tractable theoretical problem than protein folding. Bistability and misfolding in nucleic acids have been recently investigated in temperature ramping ͓11͔ and force pulling ͓12͔ experiments.
In this work we address the problem of folding and misfolding in RNA molecules that are stretched by mechanical forces. Using single molecule techniques it is nowadays possible to pull on individual molecules such as biopolymers ͑e.g., nucleic acids, proteins, sugars…͒, molecular complexes ͑e.g., motor proteins and DNA or protein fibers͒ or even to stretch cells. Single molecule techniques provide valuable information about the thermodynamics and kinetics of biomolecular processes, thereby enlarging our knowledge of fundamental processes at the molecular and cellular level ͓13͔. Among the most successful techniques in the field are optical tweezers, AFM and magnetic tweezers, all them capable of exerting forces in the piconewton ͑pN͒ range ͑1 pN=10 −12 N͒. Various studies have investigated the unfolding and refolding of individual RNA molecules using optical tweezers. RNA hairpins are typically unzipped at forces around 15 pN where base pairs are disrupted by the direct action of force. Folding kinetics in force is of current interest as it provides an alternative route to investigate the problem of molecular folding, complementary to studies of folding by varying temperature or denaturant concentration. What is the main effect of force in RNA folding? Under the action of mechanical forces, the formation of secondary contacts in RNA between bases located at distant segments of the molecule is hampered by the stretching effect of the force. Starting from a stretched state and by progressively decreasing the force, folding is partially a sequential process in contrast to the nonsequential mechanism observed in thermal folding ͓16͔. Here we introduce a phenomenological model, based on a sequential dynamics at the level of individual base pairs, which is useful to investigate folding and misfolding of RNA molecules that lack tertiary contacts. We apply it to interpret and reproduce experimental results recently obtained in the three-helix junction S15 RNA molecule, hereafter referred to as RNAS15, pulled by optical tweezers ͓17͔ ͑see Fig. 1͒. These experiments consist of repeated force cycles that start from the fully stretched molecule at high forces. The force is first decreased down to low values to let the molecule refold. Next, it is increased up to the initial value in order to unfold the molecule again ͓18͔. In this way the folding reaction can be monitored as a function of time. In such experimental conditions, we show that RNAS15 undergoes force-induced misfolding behavior as a consequence of the competition between the formation of two hairpins that cannot coexist in the same conformation. The computed misfolding probability, defined as the probability to end up in the misfolded state at the end of the relaxing process, is in good agreement with that obtained in the experiments. We are also able to reproduce the experimental unfolding and refolding force-extension trajectories, and obtain distributions of breakage forces ͑i.e., the force at which the native structure unfolds͒ that match the experimental ones at different loading rates.

II. TWO UNFOLDING PATTERNS IN RNAS15
The present work is based on previous pulling experiments ͓17͔ where optical tweezers ͓19͔ were used to induce unfolding and refolding in RNAS15 at room temperature ͑T = 298 K͒ in a solvent free of magnesium ions to avoid the formation of tertiary contacts. In these experiments a molecular construct is synthesized where the molecule RNAS15 is inserted between molecular DNA or RNA hybrid handles that provide enough space between the two beads to avoid nonspecific interactions between the molecule and the beads ͑see Fig. 1͒. The force applied on the molecular construct ͑RNAS15 plus handles͒ is then ramped at constant speed ͓20͔ between 2 pN and 20 pN at two loading rates, r =12 pN s −1 and r =20 pN s −1 . At 2 pN ͑20 pN͒, the thermodynamically stable state is the folded ͑stretched͒ state. The output of the experiments is the force-extension curve that gives the force applied to the molecule as a function of the molecular extension. During the unfolding part of the cycle ͑2 pN→ 20 pN͒, two types of unfolding curves, referred to as major and minor, are observed ͑see Fig. 1͒. The major curves correspond to approximately 95% ͑90%͒ of the trajectories at r Ӎ 12 pN s −1 ͑Ӎ20 pN s −1 ͒. The minor curves correspond to the rest Ӎ5% ͑Ӎ10% ͒.
The major curves show a cooperative transition similar to that observed in the unzipping of small RNA hairpins ͓17,18͔. Up to forces ϳ15 pN, the force-extension curve corresponds to the stretching of the molecular handles used to manipulate the molecule ͓17,18͔. The sudden large gain in the extension at forces around 15 pN is consistent with the whole opening of RNAS15 that is 77 bases long. On the other hand, the minor curves do not show the typical stretching behavior of the handles at low forces ͑f Ͻ 14 pN͒. In particular, a noncooperative transition occurs at force values between 6 and 9 pN. At these forces, the minor trajectories show large fluctuations in the extension ͑Fig. 1͒ suggesting the presence of fast conformational events where the molecule partially unfolds and refolds. Moreover, the cooperative transition observed in the minor curves at forces around 14 pN corresponds to the opening of an ϳ30 bases domain that is much shorter than the total length of the RNA molecule.
As shown in Fig. 2, the major unfolding curves are well reproduced by using an extension of the sequential kinetic describes the folding and unfolding force kinetics of single hairpins at the level of individual base pairs. It has one free parameter, which is the attempt frequency k a for the opening and closing rates of a single base pair ͑see Methods͒. We extend this model to include multibranched structures such as N in RNAS15, which is composed of a stem S that branches into two hairpins H1 and H2 ͑Fig. 2͒. We also include the effect of the instrumental setup used in the optical tweezers experiments ͓23͔.
Our numerical results show that, during the unfolding transition, the whole structure unfolds immediately after the stem opens. Accordingly, an analysis of the distribution of breakage forces predicts a transition state for the unfolding reaction that is located close to the native state ͑see Meth-ods͒. The corresponding kinetic barrier is actually generated by the presence of successive strong GC base pairs in the stem ͑throughout, G,C,A, and U stand for guanine, cytosine, adenine and uracil͒. On the other hand, this sequential dynamics applied to the structure composed of the native hairpins H 1 and H 2 does not reproduce the minor curves ͑data not shown͒. This suggests that the minor curves correspond to the unfolding of a misfolded structure, rather than to the unfolding of a structure that is partially folded into N ͑with hairpins H 1 and H 2 , but not the stem, formed͒. By using the VIENNA package for predicting RNA structures ͓25͔ we have searched for the most stable structure without the stem formed ͑in order to avoid the large cooperative rip characteristic of the major curves͒. This structure, denoted as M, is composed of two hairpins, H 1 M and H 2 M , and has a free energy of 6.3 kcal/ mol ͑Ӎ10.5k B T͒ above that of the native structure ͓see Methods and Fig. 3͑a͔͒-note that N and M cannot coexist at the same time since the same nucleotides are involved in different base pairings. Upon stretching M, numerical simulations show minorlike unfolding curves similar to the experimental ones ͓see Fig. 3͑b͔͒. In the simulations, the cooperative transition observed around 14 pN corresponds to the unfolding of the ϳ30 bases hairpin H 2 M as shown in Fig. 3͑c͒. This figure also shows that for loading rates similar to those of the experiments, H 1 M unfolds in a noncooperative way at force values between 6 and 9 pN ͑see Appendix A for a discussion on this issue͒. This corresponds to the noncooperative transition observed in the experimental ͑unfolding͒ minor curves ͑see above and Fig. 1͒. In the following, we provide quantitative evidence showing that the minor curves indeed result from the formation of M.

III. MINIMAL STRUCTURES MODEL
In order to investigate the folding and misfolding in RNAS15 we introduce a model that can be applied to any nucleic acid secondary structures. We call it the minimal structures model ͑MSM͒. The essential idea behind the model consists in associating to each type of experimental unfolding curve-two in the case of RNAS15, "major" and "minor"-a unique stable structure, whose unfolding forceextension pattern, obtained using the sequential dynamics, reproduces the experimental one. From this set of stable structures, which we call minimal structures, we generate the ensemble of configurations used to investigate both the unfolding and the refolding of the molecule. These configurations, hereafter referred to as MSM configurations, are built as follows. First, we consider all the intermediate configurations resulting from the sequential unfolding of each minimal structure. Each of these intermediate configurations is composed of hairpins that are separated by regions of unpaired bases. The ensemble of MSM configurations results from all the possible combinations of these hairpins ͑Fig. 4͒. The initial set of locally stable structures is said to be minimal since each of these structures is necessary to reproduce one of the patterns of unfolding force-extension curves obtained in the experiments. Moreover, this minimal set of structures makes simulations of kinetics affordable from a computational point of view ͑the number of configurations in the MSM grows in a polynomial way as ͟ i=1,#MS N i , N i being the total number of base pairs of the minimal structure i and #MS the total number of minimal structures͒. Although the inclusion of more FIG. 3. ͑Color online͒ Unfolding of the misfolded structure. ͑a͒ The most stable structure without stem ͑called, in this paper, the misfolded structure͒ is composed of two hairpins: H 1 M ͑orange͒ and H 2 M ͑red͒. Its free energy of formation is equal to ⌬G 1 = −29 kcal/ mol= −48.3k B T. ͑b͒ Experimental minor unfolding curves compared with numerical results obtained from the sequential unfolding of the misfolded structure on the left. ͑c͒ Curves obtained from sequential simulations ͑see the text͒ of the unfolding of the individual hairpins H 1 M and H 2 M that compose the misfolded structure. Continuous lines represent a low bandwidth average of the force-extension data.
structures might appear desirable, the implementation of the kinetics soon becomes exceedingly complicated and little is actually gained regarding comparison with the experiments. Finally, the dynamics that we implement at the level of single base pairs ͓21͔ satisfies detailed balance and is ergodic ͑i.e., each configuration in the MSM is connected through a path, made out of a finite number of successive openings and closings of base pairs, to any other configuration͒. Detailed balance and ergodicity are essential properties of the dynamics ensuring that, in the equilibrium state, all configurations are accessible and sampled according to the Boltzmann-Gibbs distribution. Detailed balance and ergodicity make the link between dynamics and thermodynamics where time averages can be replaced by ensemble averages.
During refolding there is always competition in the formation of hairpins that have bases in common ͑e.g., H 1 and H 2 M in RNAS15͒. Therefore, with more than one minimal structure, the MSM naturally leads to the formation of the different minimal structures and hence to misfolding. In RNAS15, a comparison between experiments and numerical simulations for the unfolding curves ͑Figs. 2 and 3͒ suggests one to choose N and M as the minimal structures. The total number of configurations within the MSM is on the order of a few hundreds. We have carried out numerical simulations of force cycles in the MSM in RNAS15 and observed the presence of minor and major unfolding curves in agreement with the experiments. Yet, the current model is not good enough to reproduce the experimental results as we are still not able to simultaneously reproduce the unfolding and refolding curves in a quantitative way ͑data not shown͒. In particular, by choosing a value of the attempt frequency k a that fits well the unfolding curves, we obtain refolding curves that do not match the experimental results ͑typical refolding forces are 2 pN higher in simulations than in ex-periments͒. Different causes could explain this discrepancy. First, we have neglected a large number of configurations that might compete with those of the MSM and whose presence would lead to lower refolding forces in agreement with the experimental results. In addition, the transient formation of tertiary interactions such as pseudoknots, could be relevant during the folding process.
The number of secondary structures that can be formed in RNA grows exponentially with the total number of bases. Therefore, it is impossible, in large molecules, to simulate kinetics in the full ensemble of secondary structures. Although it is possible to determine the free energy of all possible secondary structures it appears extremely difficult to implement kinetic rules between all possible configurations. The simplest strategy, in order to include the effect of additional structures on the dynamics, is to consider all possible secondary contacts that can be formed within the unpaired regions in a given MSM configuration. Because the explicit inclusion of all possible secondary structures in the dynamics is too difficult, we take advantage of approximative schemes to address such a problem. The current problem is reminiscent of that encountered in liquid or statistical field theories where an infinite class of correlation functions or observables have to be simultaneously solved. It is then common to solve the dynamics by closing the hierarchies of observables by selecting only a specific subset among all possible classes and resumming all diagrams among that subset. Here we adopt such a strategy. In the spirit of resummation techniques in statistical physics, we integrate out all these additional structures and add corrections to the free energies of the MSM configurations as explained below.

A. Estimate of the free-energy correction in the MSM
Let us consider a generic configuration C of the MSM with free energy G͑C , f͒ at a given force f. C is by definition composed of hairpins and regions of unpaired bases ͑Fig. 5͒. Starting from this configuration, we can generate additional ones by allowing the formation of secondary contacts between complementary bases within each unpaired region. The inclusion of these additional configurations in the MSM would result in a larger ensemble of configurations. This would also modify the thermodynamics of the system. Hence, in order to keep an ensemble of configurations as small as possible, the effect of such additional configurations is taken into account by adding a free-energy correction G c ͑C , f͒ to each configuration C. Subsequently, the free energy of any configuration C in the MSM can be split into three contributions as follows: G 0 ͑C͒ is the free energy of formation of the configuration C at zero force. G m ͑C , f͒ stands for the contribution to the me- chanical free energy due to the stretching of the unpaired regions that are exposed to the force. This is equal to ͐ 0 f x C ͑fЈ͒dfЈ, where x C ͑f͒ is the equilibrium average extension of the configuration C at force f. Finally, the free-energy correction at force f, G c ͑C , f͒, is added so that G͑C , f͒ includes C and all the possible secondary structures that can be formed from C using the bases of the unpaired regions. Note that some of these structures may correspond to configurations originally belonging to the MSM and, therefore, should not be included in the calculation of G c ͑C , f͒. In fact, the inclusion of such structures would lead to an incorrect and strongly biased estimation of the free-energy correction inherent to the large thermodynamic stability of all configurations that belong to the MSM. The proper estimation of G c ͑C , f͒ is therefore a very difficult task and a different strategy is required to circumvent this problem as we shall explain in the following.
In the present treatment, for the sake of simplicity, we do not consider interactions between bases of different unpaired regions. As a consequence, G c ͑C , f͒ can be decomposed as a sum of independent contributions g c i coming from each unpaired region i. Having proceeded so far, we try to get an estimation of the correction G c ͑C , f͒ that can be efficiently implemented in the numerical simulations of the kinetics. We use an annealed approximation where the contribution from each region i only depends on the number n i of bases of that region, g c i = g c ͑n i , f͒. As a result, we get G c ͑C , f͒ where N U is the total number of unpaired regions ͑see Fig. 5͒.
As the free energy of an RNA sequence depends much on its sequence, g c ͑n , f͒ should be estimated for each primary sequence. In this regard, our estimation procedure consists, first, in evaluating the average free energy of an n-base-long polynucleotide chain that is chosen within that sequence ͑see Methods͒. The average is taken over all possible segments of length n along that sequence. To this value we subtract the initial stretching free energy G m ͑n , f͒ of the n-bases-long polynucleotide and obtain F͑n , f͒. F͑n , f͒ is always a lower bound to g c ͑n , f͒ as it includes the contribution coming from the additional new configurations but also the contribution from configurations already generated by the minimal structures. In fact, by averaging over all segments covering the whole sequence, the term F͑n , f͒ gets contributions from all possible hairpins that can be formed with n bases. Therefore F͑n , f͒ is biased toward low values due to the stabilizing contribution to the free energy by the minimal structures ͑e.g., the native or the misfolded structures in the case of RNAS15͒. This bias is particularly strong at low forces where the native hairpins dominate the annealed average. How does F depend on n and f? The fact that the free energy F is an extensive variable ͑i.e., depends linearly on the size of the system n, at least for n ജ 5 where loop formation is possible͒ implies that the first derivative ‫ץ‬F / ‫ץ‬f ͑i.e., respect to the intensive variable f͒ also depends linearly on n. These properties are well confirmed by using the VIENNA package ͓25͔, which gives the exact partition function and the equilibrium free energy for any RNA sequence. In the case of RNAS15 we find F͑n , f͒Ϸa f ͑n −5͒, where the parameter a f depends linearly on f up to a certain force value f c Ӎ 12 pN for which it vanishes: Fig. 5͒. We stress that, for arbitrarily long sequences, determining a and f c is still possible by restricting the calculation of the free energy F͑n , f͒ to small values of n ͑e.g., up to n Ӎ 50͒, where a f is a linear function of f ͑Fig. 5͒.
How to proceed now in order to estimate the true correction g c ͑n , f͒? The functional form obtained for F͑n , f͒ suggests the same functional dependence for g c ͑n , f͒, albeit with a priori different parameters a and f c . f c in F͑n , f͒ is the force value where the free-energy correction vanishes and below which secondary structures become, on average, more stable than the fully unfolded or unpaired form. At forces around f c Ӎ 12 pN many other configurations can be as stable as the MSM configurations. Therefore, the value of f c is not expected to be very sensitive to the bias introduced in the annealed average by the inclusion of the MSM configurations. Thus, we keep f c Ӎ 12 pN for g c ͑n , f͒ also. Consequently, the free-energy correction term leads to only one additional free parameter in the model, which we call A. The free-energy correction finally reads g c ͑n , f͒ϷA f ͑n −5͒ with of unfolding and folding? Additional configurations naturally tend to slow down the formation of individual hairpins that belong to the minimal structures. Accordingly, the freeenergy correction modifies the closing rates rather than the opening rates of individual base pairs ͑see Methods͒. Therefore the value of the parameter A mostly determines the kinetics of folding rather than unfolding and a larger value of A tends to slow down the kinetics of folding.

B. Applying the model to RNAS15
Overall the model requires only two free parameters k a and A, in order to fit all the experimental data available in RNAS15. The parameters A = 0.3k B T and k a =10 7 s −1 lead, at both loading rates, to unfolding and refolding forceextension curves, distributions of breakage force, and misfolding probabilities that are in quantitative agreement with those found in the experiments ͑Figs. 6 and 7͒. Since no further explicit structures are necessary to reproduce the experimental data, we conclude that, in this case, a model containing the minimal structures N and M plus the free-energy correction term, is enough to explain both the unfolding and refolding kinetics of RNAS15. In this regard, we have ex-tended our analysis by including other minimal structures different from N and M and have obtained very similar results ͑data not shown͒.
Regarding the force-extension curves we note that the shoulder observed during the refolding trajectory ͓Fig. 6͑a͔͒ is mainly due to the transient formation of hairpins ͑H 1 , H 2 , H 1 M , and H 2 M ͒. On the other hand, the minor curves correspond to the unfolding of the misfolded structure M, where the hairpin H 2 M does not allow the formation of the native hairpin H 1 : M acts as a kinetic trap that impedes the formation of N. Misfolding in RNAS15 is not induced by thermal fluctuations since the free-energy difference between N and M is very large, ⌬⌬G 0 ϳ 10.5k B T. Rather it is induced by the force that tends to favor the misfolding pathway.
Finally, we note that the free-energy correction per base pair, A Ӎ 0.3k B T, is an order of magnitude smaller than the typical free energy of formation of individual base pairs ͑ϳ3k B T͒. Yet, it is necessary to include this correction ͑about 10%͒ to quantitatively reproduce the experimental features of the unfolding and refolding kinetics in RNAS15.

IV. MISFOLDING PROBABILITY
In a force cycle protocol, misfolding can be quantified by the misfolding probability P M . This is given by the probability to end up in the misfolded state at the end of the relaxing process. Multistate models of chemical reactions provide a general picture about the unloading rate dependence of this probability. The simplest model consists of a three-state system ͑native N, misfolded M, and stretched S͒ where the misfolded state M acts as a kinetic trap during the folding transition ͓Fig. 7͑a͔͒. Starting from S at high forces, and by decreasing the force at a constant rate r, the general question we ask is how P M ͑r͒ depends on r. In the general situation of a force-independent position of the kinetic barriers B N , B M ͑located at distances d N , d M from S͒, we find that P M ͑r͒ has a unique maximum located at r * ͑see Appendix B͒. However, if d N and d M depend on the force, P M ͑r͒ shows a more complex behavior where several maxima can appear ͑see Appendix B͒. This general scenario is expected to be applicable in RNAS15 where the results obtained from simulations of the MSM show a P M ͑r͒ with two maxima ͓Fig. 7͑b͔͒. From a general point of view, a P M ͑r͒ with more than one maximum suggests a complex free-energy landscape with forcedependent transition states ͑leading to force-dependent fragilities as in the case of RNA hairpins ͓26͔͒.

V. DISCUSSION AND CONCLUSIONS
In this work we have investigated the folding and unfolding behavior of nucleic acid secondary structures that are pulled by mechanical forces. To this aim we have introduced a phenomenological model ͑MSM͒ that is based on the sequential dynamics of a minimal number of structures, and the inclusion of corrections in the free energy that account for the configurational entropy contributed by the exponentially large number of neglected secondary structures. The model describes force-induced misfolding of nucleic acid secondary structures such as RNA and DNA. It can be applied to arbitrary nucleic acid sequences that can form different secondary structure and can be used to predict the phenomenology observed in dynamic force spectroscopy measurements ͑breakage force distributions, force-extension curves, and misfolding probability͒. The applicability of the approach has been shown in the case of the RNA three-helix junction S15.
The model can also be used in the prediction of different folding kinetics scenarios by implementing different sets of minimal structures. Sometimes the full applicability of the model may require the previous experimental identification of the minimal set of structures that generate the different patterns of force-extension curves. Although the model cannot predict misfolding for a given sequence it can be applied to identify possible misfolded states as well as kinetic intermediates by doing systematic in silico experiments. A useful strategy could be using the VIENNA package ͓25͔ to build up the minimal set of structures and consequently, to determine potential misfolded states by generating different sets of secondary structures for the given RNA sequence. Subsequently, one should search for the most stable structures that can be formed when native domains are not allowed. However, we are not able yet to provide a receipt that leads to the systematic determination of these states. As a consequence, the method we used for the determination of the misfolded structure must be specifically adapted to every RNA sequence.
For a given nucleic acid sequence the model only has two fitting parameters, k a and A. The first one, k a , is an attempt frequency at the level of individual base pairs which should not vary much with the specific sequence under study. In this regard, the value we report for k a in RNAS15 is in agreement with the values obtained for other RNA molecules ͓21,23͔ as expected. The second parameter A is a thermodynamic parameter related to the configurational space of the molecule, i.e., the space of secondary structures associated with a given nucleic acid sequence. In principle, for a given RNA, the larger the ensemble of MSM configurations, the smaller the correction, and hence the value of A. However, the total number of configurations included in the free-energy correction grows exponentially with the total number of base pairs of the molecule, whereas the number of configurations in the MSM grows as a power of that total number. Consequently, the inclusion of more minimal structures in the model should not change much the value of A. In addition, A is the freeenergy correction per base pair and, therefore, it should not be much more sensitive to the specific molecular sequence. Therefore it is reasonable to expect that the reported value of A Ӎ 0.3k B T is largely constant among all RNA sequences under identical environmental conditions ͑e.g., temperature and salt͒. What happens in the case of short canonical ͑i.e., fully complementary or Watson-Crick base-paired͒ hairpins? These molecules show two-state behavior and cooperative folding ͓21,23͔, yet the entropic correction might still be necessary to fully describe the kinetics of folding. In this case, there will be just one minimal structure ͑the native one͒ so the effect of the entropic correction, albeit small, could be experimentally observable. It would be very interesting to carry out future experiments capable of identifying, in generic two-state molecules, this correction of entropic origin. Finally, let us mention that a different theoretical approach is required to model the thermal denaturation of RNAs and the associated folding and misfolding mechanisms. In this case, the dissociation of base pairs is not a sequential process anymore.
Recent pulling experiments in HIV transactivation response region ͑TAR͒ RNA ͓12͔ have shown how stretching forces can help the formation of the native structure when the molecule is initially trapped in misfolded structures. Here, we have found that a mechanical force can also induce the opposite effect, by favoring misfolding pathways that are unlikely in the absence of force. It remains a challenge to apply this model to predict the detection of misfolded structures and kinetic intermediates in single molecule pulling experiments for specifically designed nucleic acid sequences.

A. Optical tweezers experimental setup
Experiments in RNAS15 were reported in a previous paper by Collin et al. ͓17͔. Buffer conditions were 100 mM tris-HCl, pH 8.1, 1 mM ethylenediaminetetraacetic acid ͑EDTA͒, free of magnesium ions, at room temperature T = 298 K. RNAS15 is attached, via RNA/DNA handles ͑Ӎ160 nm͒, to two micron-sized polystyrene beads. One bead is held fixed at the tip of a micropipette. The force is measured through the detection of the light deflected by the bead in the optical trap ͑Fig. 1͒.

B. Transition state along the unfolding pathway
From the breakage force data, one can obtain information about the transition state corresponding to the force-induced unfolding pathway using a two-state model. According to this model, the variance f of the breakage force distribution is inversely proportional to the distance x F from the transition state to the folded native state, that is, f = k B T x F . In RNAS15, this relation leads to a transition state for the unfolding reaction that corresponds to a configuration where only the first two or three base pairs of the stem are opened.

C. Extended sequential dynamics
In the sequential model of Cocco et al. ͓21͔, successive closing and opening of base pairs is restricted to take place at the base of the hairpin, defined as the first 5Ј-3Ј base pair formed ͑Fig. 4͒. The corresponding opening rates ͑k o ͒ depend on the free energy of formation of the base pairs, ⌬G 0 : where k a is an attempt frequency. The closing rates ͑k c ͒ depend on the mechanical energy loss ⌬G m , due to the shortening of the unpaired part of the molecule: k c = k a exp͑−⌬G m / k B T͒. These free energies have been estimated by thermal denaturation experiments ͓27͔ and single molecule force experiments, respectively ͓28,29͔. The attempt frequency k a is therefore the only free parameter of the model. Typical values measured by NMR fall in the range 10 7 -10 8 Hz ͓24͔. The extension of the model to multiple hairpins is depicted in Fig. 4.
In our simulations, we allow for the formation of both Watson-Crick and noncanonical ͑GA and GU͒ base pairs. The values for the free energies of formation of the different base pairs have been obtained from the VIENNA package ͑corresponding to 1 M NaCl ͓25͔͒ by adding a uniform correction in order to meet the salt condition of the buffer used in the experiments ͑100 mM tris-HCl͒. The salt correction is determined by imposing the value for free energy of formation in RNAS15 to be equal to that recovered in the experiments ͓17͔. The algorithm involves the whole experimental setup ͑handles and beads͒ within the so-called mixed ensemble where the control parameter is the distance between the optical trap and the immobilized bead ͓23͔ ͑rather than the force͒. Therefore, we include in ⌬G m the contribution of both the handles and unpaired RNA. The latter and the regions of unpaired RNA bases are described by using a wormlike chain model ͓30,31͔ with persistence lengths of 10 nm ͑handles͒ and 1 nm ͑RNA͒ and contour lengths of 0.26 nm/ bp ͑handles͒ and 0.59 nm/base ͑RNA͒. These values fit reasonably well the experimental force-extension curves in the region where the handles are stretched. Each hairpin contributes to the total extension with an additional extension of Ӎ2 nm. Finally, when taking into account our phenomenological corrections, k c becomes k c = k a exp͓−͑⌬G m + ⌬G c ͒ / k B T͔, where ⌬G c is the difference in the free-energy corrections between the open and closed configurations.

D. Free energy of an n-bases-long segment of RNAS15
Any secondary structure that is built up from an n-bases-long polynucleotide can be seen as a succession of unpaired regions and partial secondary structures closed by a base pair ͑for instance, in Fig. 5 the partial secondary structures are the hairpins͒. The free energy of such secondary structure can then be divided into the mechanical free energy corresponding to the stretching of both the unpaired regions and the base pairs that close the partial secondary structures, plus the free-energy formation of each partial secondary structure. In RNAS15, we estimate the latter using the VI-ENNA package. Computing the free energy of all the secondary structures that can be formed with the n-bases-long polynucleotide allows us to determine the partition function, and hence the free energy, of the n-bases-long polynucleotide at force f.

E. Misfolding probability in RNAS15
We describe the dynamics of the MSM using a set of master equations ͑see Appendix C͒. These equations describe the time evolution of the probability of the RNA to be in a specific MSM configuration. To get the misfolding probability we numerically integrate the set of equations. The force is decreased at a given unloading rate r, starting from the stretched state at an initial force f in = 20 pN. The misfolding probability is computed at the end of the relaxing process when the force vanishes, i.e., when t =20/ r. netic trap during the folding transition from the stretched state ͑S͒ to the native state ͑N͒. Let us consider the case of a pulling protocol where the mechanical force applied to the system decreases at a constant loading rate r. Starting from a high force value where the stretched state is the most stable one, we prove that the misfolding probability P M ͑r͒ at the end of the force releasing process shows a single maximum along the r axis.
We denote by P N ͑t͒, P M ͑t͒, and P S ͑t͒ the probability to be at time t in the state N, M, and S, respectively. The relaxation process is governed by the following set of master equations: where k a→b f is the transition rate to go from state a to state b at a given force f. Note that this model does not allow for direct transition pathways connecting N and M. Transitions between these states always pass through the stretched state S. S can then be viewed as an obligatory intermediate state of the reaction N M ͑see Fig. 9͒.

Absorbing states
In a first stage, we study the analytically tractable case where N and M are absorbing states, i.e., k N→S = 0 and k M→S = 0. The set ͑B1͒ of master equations becomes In the presence of a mechanical force that is coupled to the molecular extension, the rates k S→N , k S→M can be written as k S→N = k N exp͑−␤d N f͒ and k S→M = k M exp͑−␤d M f͒, respectively, where d N ͑d M ͒ is the distance along the reaction coordinate between S and the kinetic barrier separating the state S from the state N ͑M͒ ͑see Fig. 7͒, k N and k M are the rates at zero force, respectively, and ␤ = ͑k B T͒ −1 is the inverse of the thermal energy unit. Using these relations for the rates and considering a ramping protocol where the force decreases at a constant rate r ͑ḟ =−r͒, the set of Eqs. ͑B2͒ can be written in terms of the force as follows: Starting from an initial stretched state at very large force ͑f Ϸ ϱ, P S =1, P N = P M =0͒, the solution to Eq. ͑B3͒ is given by Let us focus now on the misfolding probability P M = P M ͑f =0͒. Starting from Eq. ͑B4͒ and after some simple manipulations, P M can be written as k M , and x = d N / d M are adimensional parameters. Interestingly, depending on the ratio x = d N / d M , two behaviors can be distinguished for the dependence of P M as a function of the adimensional rate r, i.e., of the rate r. In the following, we show that for x Ͻ 1, P M has a single maximum along the r axis, whereas for x ജ 1, P M is a decreasing function of r.
The first derivative of P M with respect to r reads ‫ץ‬ r P M = 1

͑B6͒
This clearly shows that when x ജ 1, ‫ץ‬ r P M is negative for all the ͑positive͒ values of r, i.e., P M is a decreasing function of r. When x Ͻ 1, the analysis is a bit more complicated. Let us show that ‫ץ‬ r P M = 0 has at least one solution for r Ͼ 0. First, when r → ϱ, from Eq. ͑B6͒ it is clear that ‫ץ‬ r P M is negative. Second, the following inequality holds: so that r 3 ‫ץ‬ r P M , and hence ‫ץ‬ r P M , is positive when r → 0 ͓see Eq. ͑B6͔͒. Since ‫ץ‬ r P M is a continuous function that is positive when r → 0 and negative when r → ϱ, we conclude that ‫ץ‬ r P M = 0 has at least one solution for r ജ 0. We could rigorously prove that this solution is unique. However, for the sake of lightness, we present here a proof based on physical arguments. First of all, at large r, P M decreases when r increases simply because the system does not have enough time to escape from S when the loading rate becomes too large. On the other hand, a decreasing P M when r → 0 reflects the fact that at very large forces, the probability to fold into N is much higher than the probability to fold into M, the probabilities being very low though. In this case, the more time spent at high force values, i.e., the lower r, the less probable to fold into M.
Because P M → 0 when both r → 0 and r → ϱ, P M shows at least one maximum at intermediate values of r. Moreover, in the present case where the location of the kinetic barriers does not depend on the applied force, we find that there is a single maximum for P M when x Ͼ 1.

Nonabsorbing states: The quasistatic regime
In the more realistic case where the states are not absorbing, the dependence of P M with respect to r has a different nature at low r. In this case fluctuations between M and N ͑passing through S͒ tend to populate N at low forces. Indeed, by definition, the native state N is supposed to be much more stable than the other states of the system at zero force, namely, M and S. Consequently, at low r the system has enough time to populate the native state. Or in other words, P M ͑r͒ tends to its equilibrium value Ӎexp͑−⌬⌬G 0 / k B T͒ when r → 0. In any case ͑for both x ജ 1 and x Ͻ 1͒, we hence expect that P M → exp͑−␤⌬⌬G 0 ͒Ϸ0 when r → 0, where ⌬⌬G 0 is the free-energy difference between M and N.
To conclude, we can say that in a three-state system with force-independent location of the kinetic barriers, the misfolding probability P M always shows a bell shape as shown in Fig. 10. However, the presence of the maximum may have a different cause depending on the value of the ratio x = d N / d M , i.e., depending on the relative distances of the native and misfolded kinetic barriers to the stretched state.

Force-dependent location of the kinetic barriers
Numerical simulations in RNAS15 show a complex dependence of the misfolding probability at the end of a force cycle with respect to the loading rate r ͑see Appendix C and Fig. 7͒. This suggests that RNAS15 cannot be modeled as a three-state model with force-independent position of the kinetic barriers along the reaction coordinate. Interestingly, in the three-state model described above, still one can numerically study the effect of force-dependent positions of the kinetic barriers on the shape of P M ͑r͒. Physically, a dependence of d N and d M on the force corresponds to structural changes in the corresponding transition states ͓26͔. In the case of absorbing states N and M, and for a force protocol where the force is released at constant rate r, the probabilities to be in the different states N, M, and S at a given force f read

͑B8͒
By playing with the force dependence of d N ͑f͒ and d M ͑f͒ we can obtain different shapes for the misfolding probability P M ͑f =0͒ that show several extrema along the r axis. For instance, we can choose d M ͑f͒ Ͻ d N ͑f͒ at low forces and d M ͑f͒ Ͼ d N ͑f͒ at high forces. We then obtain a misfolding probability curve as the one shown in Fig. 11. The maximum at r Ͼ 0 corresponds to a typical maximum of the forceindependent case x = d N / d M Ͻ 1, whereas the minimum at lower r is due to a crossover from x Ͻ 1 to x ജ 1. Interestingly, by solving the master equations ͑C1͒ ͑see below͒ and by imposing the misfolded structure of RNAS15 to be an absorbing state, we obtain the same kind of dependence for the misfolding probability. This suggests that in RNAS15, d M ͑f͒ ഛ d N ͑f͒ at low forces. This also suggests that in the nonabsorbing case, the low r regime observed in the numerical simulations of RNAS15 is the consequence of a quasistatic regime that tends to populate the native state.

APPENDIX C: MISFOLDING PROBABILITY IN RNAS15
In RNAS15, we can estimate the misfolding probability by using the minimal structures model ͓͑MSM͒, see the main text͔. Within this scheme, each configuration in the MSM can be labeled by C i , where i =1, ... ,N , N being the total number of MSM configurations. If P i ͑t͒ is the probability to be in the configuration C i at time t, the dynamics within the MSM is governed by the following set of master equations: where ͗j͘ counts for all the MSM configurations C j that are connected to C i via the sequential dynamics described in the Methods ͑see the main text͒. k i→j f and k j→i f are the corresponding force-dependent closing and opening rates ͑see the Methods section͒.
We numerically integrate this system by imposing a decreasing force at constant rate r with the following initial condition: the molecule is in the stretched state ͓P i ͑t =0͒ =1 if C i = S and P i ͑t =0͒ = 0 otherwise͔ and the force f = 20 pN. The curves we obtain are in good agreement with the experimental results ͑see Fig. 12͒.
Numerically, we have checked that our results remain unchanged using a coarse-grained description at the level of a few base pairs in order to get results faster ͑simulations tend to be very slow when the number of configurations starts to grow͒. In this case, we use the following two-state approximation. Let us suppose, for instance, that we coarse-grain the system of Eqs. ͑C1͒ at the level of n bp base pairs ͑typically n bp =2,3͒. If k o,c * are the effective opening and closing rates,