Article ID Journal Published Year Pages File Type
416599 Computational Statistics & Data Analysis 2007 10 Pages PDF
Abstract

Latent semantic indexing (LSI) is a method of information retrieval (IR) that relies heavily on the partial singular value decomposition (PSVD) of the term-document matrix representation of a data set. Calculating the PSVD of large term-document matrices is computationally expensive; hence in the case where terms or documents are merely added to an existing data set, it is extremely beneficial to update the previously calculated PSVD to reflect the changes. It is shown how updating can be used in LSI to significantly reduce the computational cost of finding the PSVD without significantly impacting performance. Moreover, it is shown how the computational cost can be reduced further, again without impacting performance, through a combination of updating and folding-in.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,