Article ID Journal Published Year Pages File Type
416291 Computational Statistics & Data Analysis 2015 18 Pages PDF
Abstract

There is a growing need to analyze datasets characterized by several sets of variables observed on a single set of observations. Such complex but structured dataset are known as multiblock dataset, and their analysis requires the development of new and flexible tools. For this purpose, Kernel Generalized Canonical Correlation Analysis (KGCCA) is proposed and offers a general framework for multiblock data analysis taking into account an a priori   graph of connections between blocks. It appears that KGCCA subsumes, with a single monotonically convergent algorithm, a remarkably large number of well-known and new methods as particular cases. KGCCA is applied to a simulated 33-block dataset and a real molecular biology dataset that combines Gene Expression data, Comparative Genomic Hybridization data and a qualitative phenotype measured for a set of 5353 children with glioma.KGCCA is available on CRAN as part of the RGCCA package.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,