کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
395667 666000 2011 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A trigram hidden Markov model for metadata extraction from heterogeneous references
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
A trigram hidden Markov model for metadata extraction from heterogeneous references
چکیده انگلیسی

Our objective was to explore an efficient and accurate extraction of metadata such as author, title and institution from heterogeneous references, using hidden Markov models (HMMs). The major contributions of the research were the (i) development of a trigram, full second order hidden Markov model with more priority to words emitted in transitions to the same state, with a corresponding new Viterbi algorithm (ii) introduction of a new smoothing technique for transition probabilities and (iii) proposal of a modification of back-off shrinkage technique for emission probabilities. The effect of the size of data set on the training procedure was also measured. Comparisons were made with other related works and the model was evaluated with three different data sets. The results showed overall accuracy, precision, recall and F1 measure of over 95% suggesting that the method outperforms other related methods in the task of metadata extraction from references.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 181, Issue 9, 1 May 2011, Pages 1538–1551
نویسندگان
, , ,