Skip to main content

Integrative Methods for Genomic Data using Sparse Canonical Correlation Analysis

Date:
-
Location:
University of Kentucky, Whitehall Classroom Building room 304
Speaker(s) / Presenter(s):
Prabhakar Chalise, Mayo Clinic, Rochester, MN

 

Identification of complex multivariate relationships is important in large scale genomic studies involving genetic variation, mRNA, proteins, etc., along with external environmental factors, which together give rise to complex diseases and phenotypes. One integrative analysis approach to assess the relationship between multiple types of genomic date is Canonical Correlation Analysis (CCA). However, when the number of variables far exceeds the number of subjects, such in the case of a large-scale genomic studies, traditional CCA methods cannot be used. In addition, when the variables are highly correlated the sample covariance matrices become unstable or undefined. To overcome these two issues and to make the results biologically interpretable, Sparse Canonical Correlation Analysis (SCCA) for multiple data sets has been proposed using a lasso type of penalty. An additional step that uses the Bayesian Information Criterion (BIC) has also been suggested to further filter out unimportant variables. I have assessed the performance of the SCCA method that uses four different penalty functions, as well as methods for three data sets maximize the sum of the pairwise correlations. I will present the possibility of using a weighted sum to better explain the relationships between the genotypic and phenotypic variables.

Event Series: