Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
10225956 | Computers & Geosciences | 2018 | 24 Pages |
Abstract
This inspired us to build a segmenter for the geoscience subject domain. By integrating the unigram language model and deep learning, we propose a weakly supervised model: DGeoSegmenter. DGeoSegmenter is trained with words and corresponding frequencies. We built DGeoSegmenter using the bi-directional long short-term memory (Bi-LSTM) model, which randomly extracts words and combines them into sentences. Our evaluation results using geoscience reports and benchmark datasets demonstrate the effectiveness of our method, DGeoSegmenter can segment both geoscience terms and general terms. Since manually labeled datasets and hand-crafted rules are not necessary for this proposed algorithm, it can easily be applied to various domains including information retrieval and text mining.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science Applications
Authors
Qinjun Qiu, Zhong Xie, Liang Wu, Wenjia Li,