Calculating the Partition Coefficients of Organic Solvents in Octanol/Water and Octanol/Air

Partition coefficients define how a solute is distributed between two immiscible phases at equilibrium. The experimental estimation of partition coefficients in a complex system can be an expensive, difficult, and time-consuming process. Here a computational strategy to predict the distributions of a set of solutes in two relevant phase equilibria is presented. The octanol/water and octanol/air partition coefficients are predicted for a group of polar solvents using density functional theory (DFT) calculations in combination with a solvation model based on density (SMD) and are in excellent agreement with experimental data. Thus, the use of quantum-chemical calculations to predict partition coefficients from free energies should be a valuable alternative for unknown solvents. The obtained results indicate that the SMD continuum model in conjunction with any of the three DFT functionals (B3LYP, M06-2X, and M11) agrees with the observed experimental values. The highest correlation to experimental data for the octanol/water partition coefficients was reached by the M11 functional; for the octanol/air partition coefficient, the M06-2X functional yielded the best performance. To the best of our knowledge, this is the first computational approach for the prediction of octanol/air partition coefficients by DFT calculations, which has remarkable accuracy and precision.


■ INTRODUCTION
Physical properties of molecules can be used, for example, to make predictions about the environmental fate of unknown solvents. The lack of physical property data may be resolved by the development of different computational methodologies for the prediction of the appropriate physicochemical properties. In particular, a variety of methods have been developed to predict partition coefficients from a chemical structure. 1−9 One of the most important prediction methods is based on quantitative structure−property relationships (QSPRs). 10−12 Various algorithms and online platforms based on QSPRs, such as AlogPs, ADMET predictor, and ACD/logD, have been developed. The main approach of these methods is based on finding the appropriate set of molecular descriptors that allow the precise reproduction of a given physical property using a large database of available experimental data. Accurate QSPR models are obtained when this method is applied to molecules that resemble the ones in the database used to build the model. Thus, the weakness of QSPR models is related to the prediction of properties of molecules that vary slightly from those used in the database. 13−15 Alternatively, partition coefficients can be predicted by taking into account the fact that this property is related to the free energy difference of a solute in different solvents. In the work of Bannan et al., 16 the computational scheme that was used consisted of molecular dynamics simulations with explicit solvent molecules to obtain transfer free energies between the solvents, which in turn were used to calculate their partition coefficients. The octanol/water and cyclohexane/water partition coefficients were obtained using the generalized AMBER force field (GAFF) and the dielectric-corrected GAFF (GAFF-DC).
Jones et al. used ab initio calculations to predict the cyclohexane/water partition coefficients for a set of 53 compounds. The free energy of transfer was calculated with several density functionals in combination with the solvation model based on density (SMD) implicit-solvent model, and a good estimation was obtained. 17 Rayne and Forest computed air−water partition coefficients for a data set of 86 large  18 The results obtained using the three solvation models for a range of neutral and ionic compounds showed accurate air/water partition coefficients (K aw ). Better accuracy is obtained when higher levels of theory are used.
In the work of Michalik and Lukes, 19 the octanol/water partition coefficients for 27 alkane alcohols were predicted by quantum-chemical calculations with three solvation models. The results were in rather good agreement with the corresponding experimental values. When comparing their results with those obtained using other implicit-solvent models (IEFPCM or CPCM), the authors observed deviations from linearity. This mixed quantum (DFT)−QSPR analysis has recently been implemented successfully in the prediction of pK a values for carboxylic acids. 20 In the present work, the prediction of octanol/water and air/ water partition coefficients for a set of 55 organic solvents was carried out by means of density functional theory (DFT) calculations. Solvation free energies were computed with various density functionals to estimate the partition coefficients. Good correlation coefficients between the calculated and experimentally measured values were obtained.

■ DATA SET
The data set consisted of 55 molecules that were selected by a previous study. 21 In that study, 150 solvents were clustered on

Journal of Chemical Information and Modeling
Article the basis of physicochemical propertiesmelting and boiling points, density, water solubility, vapor pressure, Henry's law constant, logP, logK oa , and surface tension. Molecules included in the present study belong to polar or nonpolar groups. The experimental values for the octanol/water and air/water partition coefficients are presented in Table S1. The majority of these 55 molecules have been considered to be green solvents in various literature reports. 21 Some of the included molecules, such as ether−alcohols, are poorly characterized in terms of their hazards or physicochemical properties. 22

■ COMPUTATIONAL METHODS
All of the calculations presented in this work were performed using the Gaussian 16 quantum chemistry package. 23 Molecular structures were generated in the more extended conformation using GaussView 5.0. 24 The geometries of all 55 molecules were optimized using the three density functionals M06-2X, 25 M11, 26 and B3LYP 27 with the 6-311+G** basis set using the continuum solvation model based on density (SMD). 28 Hessian analysis indicated no existence of imaginary frequencies, proving that all of the optimized structures were true minima. SMD can be used as a universal solvation model because it can be applied to any charged or uncharged solute in any type of solvent. 29 The parameters required for the solvent are the dielectric constant, refractive index, bulk surface tension, and acidity and basicity parameters. This model divides the solvation free energy into two main contributionsthe bulk electrostatic contribution and the cavity dispersion contribution.
To calculate the octanol/water partition coefficient, the SMD free energies obtained in the two solvents at 298.15 K were used to calculate the standard free energy associated with the transfer of the solute from the aqueous phase (w) to octanol (o): The octanol/water partition coefficient was then calculated according to To calculate the octanol/air partition coefficient, the SMD solvation free energy in octanol was obtained from the free energies of the molecule in the gas phase and in octanol: The octanol/air partition coefficient was then calculated according to

■ RESULTS
The structures of the 55 solvents under study are shown in Figure 1. All of the correlation coefficients, slopes, and intercepts for all of the molecules are collected in Tables 1 and  2. The linear correlations between the experimental and calculated values for the octanol/water and octanol/air partition coefficients for the observed organic solvent data set are presented graphically in Figures 2 and 3.
Notably, a good linear correlation was obtained for the logP values with the M11 functional. As shown in Tables 1 and 2, a variety of statistical error metrics were calculated to compare the calculated logP and logK oa values to the experimentally determined values: root-mean-square error (RMSE), Pearson correlation coefficient (R), mean absolute deviation (MAD), mean square error (MSE), mean absolute percentage error (MAPE), and standard error. The assumed outliers observed in the first data assessment (Figures S1−S3) were eliminated in the next level of linear regression by applying the 4σ rule 30 for the detection of outliers. Extremely large and extremely small values for both coefficients were observed in our data. The 4σ region represents 99.99% of the values for a normal distribution and 97% for symmetric unimodal distributions. In our case, the outliers were adequately identified and excluded from further statistical treatment of the data set. New plots were established and are presented in Figures 2 and 3.
These metrics allow us to make meaningful comparisons between experimental and calculated data with the SMD solvent model and the three functionals. As previously mentioned, the best results were obtained with the M11 functional, with RMSE = 0.72 and R = 0.99.
The obtained statistics show good accuracy of the computed partition coefficients with the M11 functional. In general, the signs of the calculated and experimental data were in agreement. Positive values indicate a preference for the organic phase, and negative values indicate a preference for the aqueous phase. For DFT-based SMD solvation models, this approach appears to be appropriate for obtaining good correlations with the experimental measurements in the calculation of logP. Presumably, the accuracy of the results is due to the appropriate parametrization of the SMD solvation model to yield accurate solvation free energies.

Article
The predicted values for logK oa allowed us to evaluate the solvation free energies obtained with the SMD model. The various statistical error metrics were calculated and are depicted in Table 2. The best agreement between the calculated and experimental partition coefficients was obtained with the M06-2X functional, indicating a good estimation of the solvation free energy using the M06-2X functional. These findings provide useful information for understanding the partitioning, and the proposed computational scheme can be applied to other unknown solvents. For the other functionals, the obtained results also indicate that the approach is valid.
In all of the calculations, a single conformation (the more extended conformation) was used for each molecule in each phase. Although this conformation was verified to be a minimum in the gas phase and in both solvents, it is possible that multiple conformations in the gas phase and/or the solvent should be taken into account to obtain accurate descriptions.

■ CONCLUSIONS
Octanol/water and octanol/air partition coefficients were predicted within the framework of DFT using three density functionals (B3LYP and two popular (but different) Minnesota meta-GGA functionals, M06-2X and M11), and their accuracies were assessed. The best correlations with the experimental data were achieved with the M11 and M06-2X functionals, respectively. The quality of the regression models was significantly improved after exclusion of points that were determined to be outliers using the 4σ rule. Thus, this computational protocol could be used as a tool for newly synthesized solvents where there is a lack of data regarding the octanol/water and octanol/air partition coefficients. The methodology should be useful for predicting missing data for envirometrics data interpretation and modeling.