Skip to main content

Impact of the design matrix structure on the performance of LASSO: An empirical study

Date:
-
Location:
University of Kentucky, Whitehall Classroom Building room 242
Speaker(s) / Presenter(s):
Jaroslaw Harezlak, Indiana University

High-throughput technologies in medical research provided statisticians with an ever increasing amounts of data. One of the methodological and practical challenges in the analysis of such data is variable selection in regression models. The past 15 years brought a formidable number of methods dealing with the variable selection in the case when the number of covariates is much larger than the number of observations (p>>n). Majority of these methods fall under the category of penalized likelihood which includes ridge regression, LASSO and its variations, SCAD and Dantzig selector.
In our work, we provide simulation results on the performance of LASSO in the case of strong dependence between the columns of the design matrix X. We consider the estimation error and prediction error. We study the dependence of the results on the design matrix specification, ``irrepresentability condition'' of Zhao and Yu (2006) and ``phase transition'' of Donoho and Stodden (2006). We also compare these results with the more common situation of orthogonality of columns of X.
In the compound symmetry case, we find that the increased dependence between the columns of X results in larger estimation error, but decreased prediction error. In the anisotropic correlation case, both estimation and prediction errors are largest when some covariates exhibits positive correlations and some negative correlations.

Event Series: