LARGE-SCALE INFERENCE OF MULTIVARIATE REGRESSION FOR HEAVY-TAILED AND ASYMMETRIC DATA

Song, Youngseok; Zhou, Wen; Zhou, Wen-Xin

doi:10.5705/ss.202021.0003

Song, Youngseok; Zhou, Wen; Zhou, Wen-Xin

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

Large-scale multivariate regression is a fundamental statistical tool with a wide range of applications. This study considers the problem of simultaneously testing a large number of general linear hypotheses, encompassing covariate-effect analysis, analysis of variance, and model comparisons. The challenge that accom-panies a large number of tests is the ubiquitous presence of heavy-tailed and/or highly skewed measurement noise, which is the main reason for the failure of con-ventional least squares-based methods. For large-scale multivariate regression, we develop a set of robust inference methods to explore data features such as heavy tailedness and skewness, which are not visible to least squares methods. The new testing procedure is based on the data-adaptive Huber regression and a new covari-ance estimator of regression estimates. Under mild conditions, we show that our methods produce consistent estimates of the false discovery proportion. Extensive numerical experiments and an empirical study on quantitative linguistics demon-strate the advantage of the proposed method over many state-of-the-art methods when the data are generated from heavy-tailed and/or skewed distributions.

Details

Title LARGE-SCALE INFERENCE OF MULTIVARIATE REGRESSION FOR HEAVY-TAILED AND ASYMMETRIC DATA

Author(s) Song, Youngseok ; Zhou, Wen ; Zhou, Wen-Xin

Published in Statistica Sinica

Volume 33

Issue 3

Pages 1831-1852

Date 2023-07-01

Publisher Statistica Sinica, Taipei

ISSN 1017-0405
1996-8507

Keywords

General Linear Hypotheses; Heavy-Tailed And/Or Skewed Regression Errors; Huber Loss; Large-Scale Multiple Testing; Multivariate Regression; Quantitative Linguistics

DOI https://doi.org/10.5705/ss.202021.0003

Other identifier(s) View record in Web of Science

Laboratories SDS

Record Appears in Scientific production and competences > SB - School of Basic Sciences > MATH - Institute of Mathematics > SDS - Chair of Statistical Data Science
Peer-reviewed publications
Work produced at EPFL
Journal Articles
Published

Grant DOE: DE-SC0018344
NSF: DMS-1811376
NIH: R01GM144961

Record creation date 2024-02-23