Super Learner

Mark J. van der Laan, University of California, Berkeley
Eric C. Polley, University of California, Berkeley
Alan E. Hubbard, University of California, Berkeley

Abstract

When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

Submitted: June 7, 2007 · Accepted: August 28, 2007 · Published: September 16, 2007

Recommended Citation

van der Laan, Mark J.; Polley, Eric C.; and Hubbard, Alan E. (2007) "Super Learner," Statistical Applications in Genetics and Molecular Biology: Vol. 6 : Iss. 1, Article 25.
Available at: http://www.bepress.com/sagmb/vol6/iss1/art25

 
 
 
 

ISSN: 1544-6115 ©1999-2008 The Berkeley Electronic Press™ All rights reserved.

To submit, subscribe, recommend this journal to your library, or sign up for email alerts, please visit: http://www.bepress.com/sagmb