Crowd-based Feature Selection for Document Retrieval in Highly Demanding Decision-making Scenarios

Article ID	Journal	Published Year	Pages	File Type
4960661	Procedia Computer Science	2017	11 Pages	PDF

Abstract

Automatic dimensionality reduction in text classification requires large training data sets due to the high dimensionality of the native feature space. However, in several real world multi-label problems, such as highly demanding decision-making scenarios, to manually classify and select features in large document sets is usually unfeasible even by specialist teams. This paper presents CrowdFS a first approach on using collective intelligence techniques to select label specific relevant features from a large document set. An experiment in the context of competitive intelligence for a multinational energy company showed CrowdFS producing better results than an automatic state of the art technique.

Keywords

document retrieval crowd Business intelligence Collective intelligence Dimensionality reduction