Article, 2024

Extremal Random Forests

Journal of the American Statistical Association, ISSN 0162-1459, 1537-274X, Volume ahead-of-print, ahead-of-print, Pages 1-14, 10.1080/01621459.2023.2300522

Contributors

Gnecco, Nicola 0000-0002-0044-5208 (Corresponding author) [1] [2] Terefe, Edossa Merga [2] [3] Engelke, Sebastian 0000-0001-6356-918X [2]

Affiliations

  1. [1] University of Copenhagen
  2. [NORA names: KU University of Copenhagen; University; Denmark; Europe, EU; Nordic; OECD];
  3. [2] University of Geneva
  4. [NORA names: Switzerland; Europe, Non-EU; OECD];
  5. [3] Hawassa University
  6. [NORA names: Ethiopia; Africa]

Abstract

Classical methods for quantile regression fail in cases where the quantile of interest is extreme and only few or no training data points exceed it. Asymptotic results from extreme value theory can be used to extrapolate beyond the range of the data, and several approaches exist that use linear regression, kernel methods or generalized additive models. Most of these methods break down if the predictor space has more than a few dimensions or if the regression function of extreme quantiles is complex. We propose a method for extreme quantile regression that combines the flexibility of random forests with the theory of extrapolation. Our extremal random forest (ERF) estimates the parameters of a generalized Pareto distribution, conditional on the predictor vector, by maximizing a local likelihood with weights extracted from a quantile random forest. We penalize the shape parameter in this likelihood to regularize its variability in the predictor space. Under general domain of attraction conditions, we show consistency of the estimated parameters in both the unpenalized and penalized case. Simulation studies show that our ERF outperforms both classical quantile regression methods and existing regression approaches from extreme value theory. We apply our methodology to extreme quantile prediction for U.S. wage data. Supplementary materials for this article are available online.

Keywords

Extreme Random Forest, Supplementary materials, U.S., U.S. wage data, additive model, approach, article, asymptotic results, attraction conditions, cases, conditions, consistency, data, data points, dimensions, distribution, domain of attraction condition, estimated parameters, extrapolation, flexibility, forest, function, general domain, generalized additive model, kernel, kernel methods, likelihood, linear regression, local likelihood, materials, method, methodology, model, parameters, penal cases, point, prediction, predictor space, predictor vector, predictors, quantile predictions, quantile random forest, quantile regression, quantile regression method, quantiles, random forest, range, regression, regression approach, regression function, regression method, results, shape, shape parameters, simulation, simulation study, space, study, theory, theory of extrapolation, training, training data points, value theory, values, variables, vector, wage data, weight

Funders

  • Swiss National Science Foundation

Data Provider: Digital Science