Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
1869494 | Physics Procedia | 2012 | 8 Pages |
Among numerous Chinese keyword extraction methods, Chinese characteristics were shortly considered. This phenomenon going against the precision enhancement of the Chinese keyword extraction. An extended term frequency based method(Extended TF) is proposed in this paper which combined Chinese linguistic characteristics with basic TF method. Unary, binary and ternary grammars for the candidate keyword extraction as well as other linguistic features were all taken into account. The method establishes classification model using support vector machine. Tests show that the proposed extraction method improved key words precision and recall rate significantly. We applied the key words extracted by the extended TF method into the text file classification. Results show that the key words extracted by the proposed method contributed greatly to raising the precision of text file classification.