Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
489580 | Procedia Computer Science | 2015 | 10 Pages |
To address multilingual document classification in an effcient and effective manner, we claim that a synergy between classical IR techniques such as vector model and some advanced data mining methods, especially Formal Concept Analysis, is particularly appropriate. We propose in this paper, a new statistical approach for extracting inter-language clusters from multilingual documents based on Closed Concepts Mining and vector model. Formal Concept Analysis techniques are applied to extract Closed Concepts from comparable corpora; and, then, exploit these Closed Concepts and vector models in the clustering and alignment of multilin- gual documents. An experimental evaluation is conducted on the collection of bilingual documents French-English of CLEF’2003. The results confirmed that the synergy between Formal Concept Analysis and vector model is fruitful to extract bilingual classes of documents, with an interesting comparability score.