Text classification based filters for a domain-specific search engine

Article ID	Journal	Published Year	Pages	File Type
508854	Computers in Industry	2016	10 Pages	PDF

Abstract

•Usage of text classification for filters in domain-specific search engines.•Annotation study with the outcome of a new text corpus for evaluation of document classification.•Insights into the deployment of new filters in search engines in a real application scenario.•On- and off-line evaluation of the approach.•Extensive study on the impact of the system's parameters.

Domain-specific search engines exist in various fields, providing additional value by exploiting knowledge of their respective domains. One common mechanism used are filters which allow narrowing down the search results based on pre-defined filter categories. In this article we exploit the usage of a text classification system for the creation of these filters. The approach is tailored to work in large-scale settings with reduced amounts of manually annotated training data and hence enables a cost-efficient roll-out of new filters. An initial annotation study resulted in a corpus which was used for an off-line evaluation of the approach giving insights into the effect of the system's parameters. Finally, a large online evaluation was executed together with a provider of a domain-specific search engine. This article presents important aspects that need to be taken into consideration when implementing text classification-based filters in the industrial setting of a domain-specific search engine.

Keywords

Text classification Search engines Active learning