کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10225956 1701230 2018 24 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain
چکیده انگلیسی
This inspired us to build a segmenter for the geoscience subject domain. By integrating the unigram language model and deep learning, we propose a weakly supervised model: DGeoSegmenter. DGeoSegmenter is trained with words and corresponding frequencies. We built DGeoSegmenter using the bi-directional long short-term memory (Bi-LSTM) model, which randomly extracts words and combines them into sentences. Our evaluation results using geoscience reports and benchmark datasets demonstrate the effectiveness of our method, DGeoSegmenter can segment both geoscience terms and general terms. Since manually labeled datasets and hand-crafted rules are not necessary for this proposed algorithm, it can easily be applied to various domains including information retrieval and text mining.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Geosciences - Volume 121, December 2018, Pages 1-11
نویسندگان
, , , ,