Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6898927 | Informatics in Medicine Unlocked | 2018 | 17 Pages |
Abstract
The amount of information available in the Internet has an exponential growth and therefore, obtaining appropriate information from such a huge repository is an indispensable yet complicated task. As the structuring of web pages is diverse across websites, there is no “one size fits all” technique to perform web data extraction. It results in the need for devising a technique that is independent of structuring of web pages, which is addressed in this paper by identifying informative content through semantic analysis rather than syntactic structure. Social web forums contain web pages which are generated using server-side templates and the information present in such websites has wide variety of applications like opinion mining, sentiment analysis, topic modeling, trend analysis etc. Of the social media forums, health discussion forums play a crucial role and analyzing data extracted from such medical forums find its application in disease detection based on symptoms, determining adverse drug reactions, suggestion of clinical tests for diseases and so on. In this paper, a fully automated technique for extracting posts from various Medical Forum Websites has been devised and it performs well for differently structured web pages belonging to diverse forum websites. Since, the technique is based on semantic features, it can be applied to other social web forums as well.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)
Authors
Umamageswari Baskaran, Kalpana Ramanujam,