Development of a Structure-Based, pH-Dependent Lipophilicity Scale of Amino Acids from Continuum Solvation Calculations

Lipophilicity is a fundamental property to characterize the structure and function of proteins, motivating the development of lipophilicity scales. We report a versatile strategy to derive a pH-adapted scale that relies on theoretical estimates of distribution coefficients from conformational ensembles of amino acids. This is accomplished by using an accurately parametrized version of the IEFPCM/MST continuum solvation model as an effective way to describe the partitioning between n-octanol and water, in conjunction with a formalism that combines partition coefficients of neutral and ionic species of residues and the corresponding p Ka values of ionizable groups. Two weighting schemes are considered to derive solvent-like and protein-like scales, which have been calibrated by comparison with other experimental scales developed in different chemical/biological environments and pH conditions as well as by examining properties such as the retention time of small peptides and the recognition of antigenic peptides. A straightforward extension to nonstandard residues is enabled by this efficient methodological strategy.


SolvL and ProtL lipophilicity scales.
Following a previous study on the hydration free energy of the natural amino acids, S1 the N-acetyl-L-amino acid amides (CH3-CO-NH-CHR-CONH2) were chosen as molecular models. Using the backbone-dependent conformational library reported by Dunbrack and coworkers, S2-S4 a total of 572 rotamers (i.e., conformers with a probability contribution higher than 5% to the total conformational space of each residue) were compiled. These structures were then used to compute the n-octanol/water transfer free energies, which were performed with the B3LYP/6-31G(d) MST S5 version of the IEF-PCM S6 model.
Computation of the distribution coefficients at a given pH (log DpH ) was performed by combining the partition coefficient of neutral and ionic species (for ionizable residues) using Eq. S1.
log D = log P N + P I × 10 d ( ) -log(1+ 10 d ) where P N and P I denote the partition coefficient of the neutral and ionized species of the amino acid, and is the difference between the pKa of the ionizable group and the pH of the environment.
Let us note that Eq. S1 represents one of the formalisms considered to estimate the pHdependent lipophilicity profile of small (bio)organic compounds, S7 and was found to reproduce satisfactorily the change in pH-dependent distribution coefficients for amino acid analogues.
The contribution of the conformational species in water and n-octanol was accounted for considering two weighting schemes, giving rise to the Solvent-like (SolvL) and Proteinlike (ProtL) lipophilicities scales, respectively.

S3
(i) In the SolvL scale, the contribution of each conformational state to the partition coefficient of the neutral/ionized species was determined using a Boltzmann weighting scheme, where the effective free energy was estimated by combining the internal energy of the conformer and its solvation free energy in water and n-octanol. To this end, the geometry of all rotamers was optimized at the B3LYP/6-31G(d) level of theory while keeping the backbone dihedrals fixed to the torsional values of the Dunbrack's library, and subsequently single-point calculations in the gas phase and in solution. The log DpH was then computed using Eq. 1, adopting the pKa values reported for ionizable residues from experimental peptide models in aqueous solutions. S8,S9 (ii) In the ProtL scale, the contribution of each conformation to the partition between the two solvents was determined by using the weights reported in the Dunbrack`s library, which reflect the rotameric distribution in a protein environment. The pKas of ionizable residues were taken from values in folded proteins. S10,S11 For the sake of comparison, we also computed both approaches with the SMD model using the B3LYP/6-31G(d) level of theory. S12 All calculations were performed using a locally modified version of Gaussian 09. S13

Comparison with experimental hydrophobicity scales.
Due to the diversity of experimental lipophilicity scales of amino acids, generally expressed in terms of transfer free energies, comparison was made by converting them to partition/distribution coefficients, which were subsequently normalized to Gly following Eq. S2.

S4
where DDG transf,AA is the transfer free energy of a given amino acid from the aqueous phase to the organic/biological environment, and DDG transf,Gly is the transfer free energy of Gly.

Determination of the cumulative lipophilicity.
Most of the experimental scales present in the literature compute the lipophilicity of a given peptide as the sum of individual lipophilicity of the constituent amino acids relative to a reference residue, usually Gly or Ala. Since the MST solvation model gives atomic contributions to the transfer free energy, S14-S16 we can separate the global lipophilicity in contributions corresponding to the backbone (bb), side-chain (sc), and the capping groups (cg). Combination of the bb and sc contributions yields the amino acid lipophilicity (reported in Table 1  The cumulative lipophilicity of a peptide with Nres residues may be estimated by using Eq. S3. where P i N / D pH i stands for the fragment (bb+sc or cg) partition/distribution coefficient, Nres and Ncg being the total number of residues and capping groups in the peptide.
For practical applications, this simple expression is convenient when there is no explicit knowledge about the 3D structure of peptides, as may occur in structureless peptides. For our purposes here, this is the expression adopted to evaluate the lipophilicity of small, flexible peptides in solution.
On the other hand, if the 3D structure of the peptide is known from experimental (X-ray, NMR) or computational (Molecular Dynamics) approaches, then the cumulative lipohilicity may be estimated taking into account the specific structural features of peptides/proteins, as noted in Eq. S4.
In Eq. S4, stands for the fraction of solvent-exposed surface area (SASA) of the amino acid (bb+sc) or capping group (cg) according to the local structural environment of in a peptide/protein. For our purposes, the SASA was determined using NACCESS. S17 In addition, two correction factors were also introduced. The parameter a i introduces a correction to the hydrophobic contribution when the backbone participates in a hydrogen bond (HB). This contribution can be estimated to amount, on average, to 0.73 (log P units) per HB. S18 The occurrence of this kind of HBs in a given 3D structural model was determined with the DSSP program. S19 Finally, the b i factor accounts for a correction due to the burial of the side chain of hydrophobic residues (Ala, Leu, Ile, Val, Pro, Phe, Trp, Met and Tyr) from water to a lipophilic environment. This contribution has been estimated to be 0.023 kcal mol −1 Å −2 according to the studies reported by Moon and Fleming for the transfer of nonpolar side chains from water into a lipid bilayer. S20 Therefore, the b i term has been estimated from the fraction of the buried side chain with respect to the fully buried side chain, as noted in Eq. S5.
where stands for the hydrophobic contribution (in logP units) of a specific apolar residue, which was estimated as noted in Eq. S6.
where SASA res sc is the average SASA of a given residue type, R is the gas constant, and T is temperature.
The values for nonpolar residues are given in Table S0.   .91 a Estimated generally using cellular MHC/competitive/fluorescence half maximal inhibitory concentration (IC50), and exceptionally from radiactive assays. When several data were available, the binding affinity is given as the mean value together with the standard deviation. S17 Figure S1. Representation of SolvL (blue) and ProtL (yellow) Lipophilicity Scales (Values Relative to Gly) at Physiological pH. S18 Figure S2.   Table  2.