HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4945134	1438296	2017	22 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

چکیده انگلیسی

- This work is a detailed companion reproducibility paper of the methods and experiments proposed in three previous works by Lastra-DÃaz and GarcÃa-Serrano, which introduce a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the aforementioned works.
- This work introduces a new representation model for taxonomies called PosetHERep, and a Java software library called Half-Edge Semantic Measures Library (HESML) based on it, which implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature.
- PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of a large set of topological queries and graph-based algorithms, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs in computational geometry.
- This work also introduces a replication framework and dataset, called WNSimRep v1, which is provided as supplementary material and whose aim is to assist the exact replication of most similarity measures and IC models reported in the literature.
- Finally, this work introduces an experimental survey on the performance and scalability of the most recent state-of-the-art semantic measures libraries. This latter experimental survey confirms the statistically significant outperformance of HESML on the state-of-the-art libraries in terms of performance and scalability, as well as the possibility to improve significantly the performance and scalability of the semantic measures libraries without caching using PosetHERep.

This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-DÃaz and GarcÃa-Serrano in (2015, 2016) [56-58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library (HESML) based on PosetHERep, which implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework and dataset, called WNSimRep v1, whose aim is to assist the exact replication of most methods reported in the literature; and finally, (5) a set of scalability and performance benchmarks for semantic measures libraries. PosetHERep and HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance and scalability, as well as the evaluation of new methods and the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained and easily reproducible experiments with the aim of replicating and confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results and difficulties in reproducing previously reported methods and experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures and IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance and scalability without caching using PosetHERep.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 66, June 2017, Pages 97-118

نویسندگان

Juan J. Lastra-DÃaz, Ana GarcÃa-Serrano, Montserrat Batet, Miriam Fernández, Fernando Chirigati,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

دسترسی سریع

ارتباط

English Website