Article ID Journal Published Year Pages File Type
558975 Computer Speech & Language 2016 18 Pages PDF
Abstract

•A novel layered semantic graph of Chinese semantic word-formation is proposed.•The layer-weighted graph edit distance and the similarity kernel are defined.•A new algorithm based on KFP-MCOC is used to predict word-formation patterns.•Comparison of predictive performance is conducted between KFP-MCOC and SVM.•Statistical test showed that the accuracy of the proposed approach is significant.

Nowadays natural language processing plays an important and critical role in the domain of intelligent computing, pattern recognition, semantic analysis and machine intelligence. For Chinese information processing, to construct the predictive models of different semantic word-formation patterns with a large-scale corpus can significantly improve the efficiency and accuracy of the paraphrase of the unregistered or new word, ambiguities elimination, automatic lexicography, machine translation and other applications. Therefore it is required to find the relationship between word-formation patterns and different influential factors, which can be denoted as a classification problem. However, due to noise, anomalies, imprecision, polysemy, ambiguity, nonlinear structure, and class-imbalance in semantic word-formation data, multi-criteria optimization classifier (MCOC), support vector machines (SVM) and other traditional classification approaches will give the poor predictive performance. In this paper, according to the characteristic analysis of Chinese word-formations, we firstly proposed a novel layered semantic graph of each disyllabic word, the layer-weighted graph edit distance (GED) and its similarity kernel embedded into a new vector space, then on the normalized data MCOC with kernel, fuzzification and penalty factors (KFP-MCOC) and SVM are employed to predict Chinese semantic word-formation patterns. Our experimental results and comparison with SVM show that KFP-MCOC based on the layer-weighted semantic graphs can increase the separation of different patterns, the predictive accuracy of target patterns and the generalization of semantic pattern classification on new compound words.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, ,