| Article ID | Journal | Published Year | Pages | File Type |
|---|---|---|---|---|
| 4951737 | Science of Computer Programming | 2017 | 63 Pages |
Abstract
Researchers have proposed several approaches to recognize, extract, and analyze structured data embedded in natural language. We analyze these approaches and investigate their drawbacks. Subsequently, we present two novel methods, based on scannerless generalized LR (SGLR) and Parsing Expression Grammars (PEGs), to address these drawbacks and to mine structured fragments within unstructured data. We validate and compare these approaches on development emails and Stack Overflow posts with Java code fragments. Both approaches achieve high precision and recall values, but the PEG-based one achieves better computational performances and simplicity in engineering.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Alberto Bacchelli, Andrea Mocci, Anthony Cleve, Michele Lanza,
