Envelope-based Partial Partial Least Squares

Authors

Park, Y., Su, Z. and Chung, D.

Journal

Statistics in Medicine, Vol 41(23), 4578-4592.

Abstract

Partial least squares (PLS) regression is a popular alternative to ordinary least squares regression because of its superior prediction performance demonstrated in many cases. In various contemporary applications, the predictors include both continuous and categorical variables. A common practice in PLS regression is to treat the cat- egorical variable as continuous. However, studies find that this practice may lead to biased estimates and invalid inferences1. Based on a connection between the envelope model and PLS, we develop an envelope-based partial PLS estimator that considers the PLS regression on the conditional distributions of the response(s) and continuous predictors on the categorical predictors. Root-n consistency and asymptotic normality are established for this estimator. Numerical study shows that this approach can achieve more efficiency gains in estimation and produce better pre- dictions. The method is applied for the identification of cytokine-based biomarkers for COVID-19 patients, which reveals the association between the cytokine-based biomarkers and patients’ clinical information including disease status at admission and demographical characteristics. The efficient estimation leads to a clear scientific interpretation of the results.

Download

The .pdf file of the article is available for download.

Supplement

Supplement information.