Response Variable Selection in Multivariate Linear Regression

Authors

Khare, K. and Su, Z.

Journal

Statistica Sinica, to appear.

Abstract

In this article, we discuss response variable selection and subsequent estimation of the regression coefficients in multivariate linear regression. Because of the asymmetric roles of the predictors and responses in regression, response variable selection is markedly different from the usual predictor variable selection. When a response is inferred to have coefficients zero, it should not be simply removed from subsequent estimation. Instead we analyze its relationship with the responses that have nonzero coefficients, which we call the dynamic responses. If it is correlated with the dynamic responses given all other responses, it should be retained to improve the estimation efficiency of the nonzero coefficients, as an ancillary statistic. Otherwise, it can be removed from further inference (leading to significant resource savings in high-dimensional settings), and we call it a static response. Therefore, we can classify the responses into three categories: the dynamic responses, the ancillary responses, and the static responses. We derive an algorithm to identify these response variables, and provide an estimator of the regression coefficients based on the selection result. Applications on synthetic and real data illustrate the efficacy of the proposed response variable selection procedure in both low and high dimensional settings. Consistency of the variable selection procedures and asymptotic properties of the estimators are established both for the large sample setting and the high-dimensional small sample setting.

Download

The .pdf file of the article is available for download.

Supplement

Technical details.

Zhihua Su

College of Liberal Arts & Sciences