| Article ID | Journal | Published Year | Pages | File Type | 
|---|---|---|---|---|
| 1170046 | Analytica Chimica Acta | 2008 | 10 Pages | 
A quantitative structure–property relation (QSPR) study was conducted on the solubility in supercritical fluid carbon dioxide (SCF-CO2) of some recently synthesized anthraquinone, anthrone and xanthone derivatives. The data set consisted of 29 molecules in various temperatures and pressures, which form 1190 solubility data. The combined data splitting-feature selection (CDFS) strategy, which previously developed in our research group, was used as descriptor selection and model development method. Modeling of the relationship between selected molecular descriptors and solubility data was achieved by linear (multiple linear regression; MLR) and nonlinear (artificial neural network; ANN) methods. The QSPR models were validated by cross-validation as well as application of the models to predict the solubility of three external set compounds, which did not have contribution in model development steps. Both linear and nonlinear methods resulted in accurate prediction whereas more accurate results were obtained by ANN model. The respective root mean square error of prediction obtained by MLR and ANN models were 0.284 and 0.095 in the term of logarithm of g solute m−3 of SCF-CO2. A comparison was made between the models selected by CDFS method and the conventional stepwise feature selection method. It was found that the latter produced models with higher number of descriptors and lowered prediction ability, thus it can be considered as an over-fitted model.
