کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
5132215 1491513 2017 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Estimation of missing values in a food property database by matrix completion using PCA-based approaches
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی آنالیزی یا شیمی تجزیه
پیش نمایش صفحه اول مقاله
Estimation of missing values in a food property database by matrix completion using PCA-based approaches
چکیده انگلیسی


- Matrix completion applied in the context of new food product development.
- Early stopping significantly improves the accuracy of matrix completion.
- VBPCA and TSRE provides more accurate estimates than IPCAE, TSR and IPCA.
- Matrix completion can help identify properties above or below the average.

In this work, five matrix completion algorithms were investigated for the estimation of missing values in a food property database: iterative PCA with (IPCAE) and without (IPCA) early stopping, trimmed scores regression with (TSRE) and without (TSR) early stopping and variational Bayesian PCA (VBPCA). Matrix completion was applied in the context of a food property database (31 properties×663 observations) developed by meta-analysis for new food product development, a novel application of matrix completion. The database contained 68.7% of missing values. VBPCA and TSRE were the most accurate algorithms and explained on average 42% and 40%, respectively, of the variance of the missing values. The incorporation of an early stopping step in the TSR and IPCA algorithms decreased overfitting and improved significantly their accuracy. The accuracy of the missing value estimates varied significantly according to the property, and the coefficient of determination for each property with VBPCA ranged from 0.02 to 0.84. The accuracy of the missing value estimates was higher when the property known for only a few observations were included in the database, indicating that the matrix completion algorithms successfully used the additional information that those properties provided to improve the estimation of the other properties in the database. For 17% of the database, the matrix completion algorithms could identify if the missing value was above or below the average value of the property with a confidence level above 90%, providing additional information for product characterization at no experimental cost.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Chemometrics and Intelligent Laboratory Systems - Volume 166, 15 July 2017, Pages 37-48
نویسندگان
, , , , ,