Abstract

Random Feature (RF) models are used as efficient parametric approximations of kernel methods. We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR). For a Gaussian RF model with P features, N data points, and a ridge lambda, we show that the average (i.e. expected) RF predictor is close to a KRR predictor with an effective ridge (lambda) over tilde We show that (lambda) over tilde > lambda and (lambda) over tilde SE arrow lambda monotonically as P grows, thus revealing the implicit regularization effect of finite RF sampling. We then compare the risk (i.e. test error) of the lambda-KRR predictor with the average risk of the lambda-RF predictor and obtain a precise and explicit bound on their difference. Finally, we empirically find an extremely good agreement between the test errors of the average lambda-RF predictor and lambda-KRR predictor.

Details