Article ID Journal Published Year Pages File Type
6865208 Neurocomputing 2018 12 Pages PDF
Abstract
Large sparse datasets are very common in product recommendation applications. For the scalability reason, linear classifiers are preferred in classification tasks on such datasets. Our previous work Li et al. (2016) has studied how sparsity affects naive Bayes classification under typical data missing mechanisms. In this paper, we greatly expand our previous work by including all linear classifiers, and explore practical strategies to improve accuracy of large sparse data classification. Using real-world and synthetic experiments, we observe different learning curve behaviors under different missing mechanisms. We also study the theoretic reasons for all our observations. Our studies provide a practical guideline to determine if or when obtaining more data and/or obtaining missing values in the data is worthwhile or not. This can be very valuable in the recommendation system applications.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,