A probabilistic matrix factorization algorithm for approximation of sparse matrices in natural language processing

Article ID	Journal	Published Year	Pages	File Type
6925842	ICT Express	2018	4 Pages	PDF

Abstract

This paper suggests a variation of a well-known probabilistic matrix factorization algorithm which is commonly used in data analysis and scientific computing, and which has been considered recently to serve natural language processing. The proposed variation is meant to take benefit from the fact that matrices processed in natural language processing tasks are normally sparse rectangular matrices with one dimension much larger than the other, and this can be used to ensure adequate accuracy with acceptable computation time. Preliminary experiments on real-world textual corpora show that the proposed algorithm achieves relevant improvements compared to the original one.

Keywords

singular value decomposition Latent Semantic Analysis Natural Language Processing