Abstract

Recently, several theories including the replica method made predictions for the generalization error of Kernel Ridge Regression. In some regimes, they predict that the method has a 'spectral bias': decomposing the true function f* on the eigenbasis of the kernel, it fits well the coefficients associated with the O(P) largest eigenvalues, where P is the size of the training set. This prediction works very well on benchmark data sets such as images, yet the assumptions these approaches make on the data are never satisfied in practice. To clarify when the spectral bias prediction holds, we first focus on a one-dimensional model where rigorous results are obtained and then use scaling arguments to generalize and test our findings in higher dimensions. Our predictions include the classification case f(x) =sign(x(1)) with a data distribution that vanishes at the decision boundary p(x) similar to x(1)(chi). For chi > 0 and a Laplace kernel, we find that (i) there exists a cross-over ridge lambda(d,chi)*(P) similar to P-1/d+chi such that for lambda >> lambda(d,chi)*(P), the replica method applies, but not for lambda << lambda(d,chi)*(P), (ii) in the ridgeless case, spectral bias predicts the correct training curve exponent only in the limit d -> infinity.

Details