Article ID Journal Published Year Pages File Type
1114768 Procedia - Social and Behavioral Sciences 2014 6 Pages PDF
Abstract

This work is part of a project aiming to define a methodology for building simple but robust stemmers, without having knowledge of the stemmer's target language. The methodology starts with a very simple primary stemmer that is applied in some collection of words and returns the corresponding stems. The primary stemmer removes always the longest suffix that match the ending of the examined word. Next, Information Retrieval (IR) experts express their arguments against the results of the primary stemmer. This methodology allows the creation of a number of consecutive trial stemmers that gradually conform increasingly to the arguments expressed by the IR experts. Here, we are giving attention to the attributes and the adjusted characteristics/options that are available to the responsible person for building the consecutive trial stemmers and finally creating the best trial (the stemmer that respects as much as possible the arguments against the primary stemmer).

Related Topics
Social Sciences and Humanities Arts and Humanities Arts and Humanities (General)