کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1181022 1491550 2013 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Unsupervised selection of informative descriptors in QSAR study of anti-HIV activities of HEPT derivatives
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی آنالیزی یا شیمی تجزیه
پیش نمایش صفحه اول مقاله
Unsupervised selection of informative descriptors in QSAR study of anti-HIV activities of HEPT derivatives
چکیده انگلیسی


• An unsupervised descriptor selection is proposed in this study.
• GSO was applied for removing collinearity and redundancy.
• Effect of each variable on model prediction ability was considered as a criterion for final selection of descriptors.
• Significant regression coefficients were selected by jack-knife resampling.

With the increasing ease of measuring and calculating multiple descriptors per molecule in quantitative structure–activity relationship, the importance of variable selection for data reduction and improving interpretability is gaining importance. While variable selection has been extensively studied in the context of supervised learning, in this paper, an unsupervised learning method is proposed for variable selection and its performance is assessed using a typical QSAR data set. Whereas there is no real dependent variable in the proposed variable selection algorithm, applied variable selection is unsupervised indeed. Besides, scores that are the linear combination of the data variables are set as dependent variables (artificial dependent variables). It includes 107 derivatives of HEPT molecule, characterized by 160 descriptors encoding the steric, hydrophobic, electronic and structural features of HEPT derivatives. The aims of this procedure are generating a subset of descriptors from a data set with the relevant variables, eliminating redundancy, and reducing multicollinearity. The core of this methodology is based on jack-knife resampling method. In this paper, using jack-knife led to selection of 48 out of 160 initial descriptors, so that the data information was preserved. Lastly, using influence effect on prediction resulted in eight descriptors as representative of the 160 descriptors. Constructed model with final 8 descriptors has Q2IN = 0.67, R2 = 0.74, Q2EXT = 0.85. It represents adequacy of our strategy for preserving data structure.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Chemometrics and Intelligent Laboratory Systems - Volume 128, 15 October 2013, Pages 135–143
نویسندگان
, , ,