Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6921061 | Computers in Biology and Medicine | 2015 | 6 Pages |
Abstract
To extract expert clinic information from the Deep Web, there are two challenges to face. The first one is to make a judgment on forms. A novel method based on a domain model, which is a tree structure constructed by the attributes of query interfaces is proposed. With this model, query interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from response Web pages indexed by query interfaces. To filter the noisy information on a Web page, a block importance model is proposed, both content and spatial features are taken into account in this model. The experimental results indicate that the domain model yields a precision 4.89% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science Applications
Authors
Yuanpeng Zhang, Li Wang, Danmin Qian, Xingyun Geng, Dengfu Yao, Jiancheng Dong,