کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
433461 | 1441716 | 2012 | 20 صفحه PDF | دانلود رایگان |
![عکس صفحه اول مقاله: WSDL term tokenization methods for IR-style Web services discovery WSDL term tokenization methods for IR-style Web services discovery](/preview/png/433461.png)
The IR-style Web services discovery represents an important approach that applies proven techniques developed in the field of Information Retrieval (IR). Many studies exploited the Web Services Description Language (WSDL) syntax to extract useful service metadata for building indexes. However, a fundamental issue associated with this approach is the WSDL term tokenization. This paper proposes the application of three statistical methods for WSDL term tokenization—MDL, TP, and PPM. With the increasing need for effective IR-style Web services discovery facilities, term tokenization is of fundamental importance for properly indexing WSDL documents. We compare our applied methods with two baseline methods. The experiment suggests the superiority of MDL and PPM methods based on IR evaluation metrics. To the best of our knowledge, our work is the first to systematically investigate the issue of WSDL term tokenization for Web services discovery. Our solution can benefit source coding mining, in which a key step is to tokenize names (i.e. terms) of variables, functions, classes, modules, etc. for semantic analysis. Our methods could also be used for solving Web-related string tokenization problems such as URL analysis and Web scripts comprehension.
► We address a critical issue for Information Retrieval-style Web services discovery.
► We propose the use of statistical methods for WSDL term tokenization.
► We show the superiority our methods compared to two baseline methods.
► Our methods can be used for source coding mining and automated script comprehension.
Journal: Science of Computer Programming - Volume 77, Issue 3, 1 March 2012, Pages 355–374