Article ID Journal Published Year Pages File Type
11002320 Expert Systems with Applications 2018 31 Pages PDF
Abstract
Term weighting is an essential step to process textual data and generate input data (vector) for machine learning algorithms. In order to appropriately represent documents into computable forms for a certain task (such as text classification, clustering, sentiment analysis, recommendation and information retrieval), semantic term weighting which considers term meanings is significant for specific applications of machine learning. Two challenging issues of semantic term weighting for clinical texts are how to determine the meaning of a medical term in a given clinical text and how to give semantic weights for a huge amount of distinct terms in clinical texts. To address those challenges, this work proposes a two-phase framework for determining semantic weights of terms in clinical texts. The proposed framework derives a two-part hierarchy where each of the nodes is categories of terms. All terms in a clinical text is classified into the categories in the hierarchy and terms in the leaf nodes are assigned with the same semantic weights. Fundamentally, the deeper the hierarchy, the higher the semantic weights. The first phase classifies all terms into the categories which are commonly significant for any tasks, by using UMLS and ICD-10. These categories are organized at the first part of the hierarchy. The second phase flexibly organizes specific categories for a certain task as the second part of the hierarchy as well as the subcategories of the first part, by specific medical domain knowledge regarding the aspect under consideration. The implementation of the proposed framework for mortality prediction with semantic weights is validated by experimental comparative evaluation using the well-known EMRs database MIMIC II. The experimental results showed that the performance is considerably improved when combining frequency-based weights and semantic weights with its significant difference derived from a paired t-test. Although the proposed framework can be applied to only medical domain, various tasks in medical domain can be covered by the proposed framework which flexibly organizes the second part (deeper levels in the hierarchy) by specific medical knowledge regarding the aspect under consideration.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,