Article ID Journal Published Year Pages File Type
406785 Neurocomputing 2014 12 Pages PDF
Abstract

•We use style markers collection and our proposed TR-PAE to study text styles.•The proposed TR-PAE have advantages in discovering hidden text style information.•The simulations demonstrate that our method outperforms the conventional approaches.•Our method reveals the cues of style changes caused by changing regions and time.

An effective algorithm for extracting cues of text styles is proposed in this paper. When processing document collections, the documents are first converted to a high dimensional data set with the assistant of a group of style markers. We also employ the Trace Ratio Criterion Patch Alignment Embedding (TR-PAE) to obtain lower dimensional representation in a textual space. The TR-PAE has some advantages that the inter-class separability and intra-class compactness are well characterized by the special designed intrinsic graph and penalty graph, which are based on discriminative patch alignment strategy. Another advantage is that the proposed method is based on trace ratio criterion, which directly represents the average between-class distance and average within-class distance in the low-dimensional space. To evaluate our proposed algorithm, three corpuses are designed and collected using existing popular corpuses and real-life data covering diverse topics and genres. Extensive simulations are conducted to illustrate the feasibility and effectiveness of our implementation. Our simulations demonstrate that the proposed method is able to extract the deeply hidden information of styles of given documents, and efficiently conduct reliable text analysis results on text styles can be provided.

Keywords
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,