Article ID Journal Published Year Pages File Type
10361344 Pattern Recognition 2005 9 Pages PDF
Abstract
Any statistical model based on training encounters sparse configurations. These data are those that have not been encountered (or seen) during the training phase. This inherent problem is a big challenge to many scientific communities. The statistical estimation of rare events is usually performed through the maximum likelihood (ML) criterion. However, it is well-known that the ML estimator is sensitive to extreme values that is therefore non-reliable. To answer this challenge, we propose a novel approach based on probabilistic logic (PL) and the minimal perplexity criterion. In our approach, configurations are considered as probabilistic events such as predicates related through logical connectors. Our method was applied to estimate word trigram probability values from a corpus. Experimental results conducted on several test sets show that the PL method with minimal perplexity has outperformed both the “Absolute Discounting”, and the “Good-Turing Discounting” techniques.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
,