کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
433949 | 1441628 | 2016 | 25 صفحه PDF | دانلود رایگان |

• New TextFlows workflow management web platform for text mining and NLP was developed.
• A survey and detailed TextFlows comparison to five other NLP platforms is provided.
• Enables simple evaluation of algorithms from NLTK, LATINO and scikit-learn libraries.
• LATINO's Max Entropy classifier achieves best results in document categorization.
• Part-Of-Speech tagging improves the accuracy of document classification.
Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of text mining workflows through a web browser, and the execution of the constructed workflows on a processing cloud. This makes TextFlows an adaptable infrastructure for the construction and sharing of text processing workflows, which can be reused in various applications. The paper presents the implemented text mining and language processing modules, and describes some precomposed workflows. Their features are demonstrated on three use cases: comparison of document classifiers and of different part-of-speech taggers on a text categorization problem, and outlier detection in document corpora.
Journal: Science of Computer Programming - Volume 121, 1 June 2016, Pages 128–152