Thanks Damien; are you sure of your interpretation of curvature wrt overfitting? AFAICT neither the XGBoost documentation nor its original paper mention it as a means to combat overfitting.
The way I see it, the use of curvature here is simply similar to a Newton-Rhaphson method of gradient descent. Said differently, we're basically just using a quadratic local approximation to the loss. Said again differently, the curvature information is mostly about determining the step size in gradient descent.
In particular, if doing least-squares regression, then the quadratic approximation becomes exact, which means that the curvature-based method yields no more and no less overfitting than fitting a regression tree with least squares (which is potentially a lot of overfitting).
Thanks Damien; are you sure of your interpretation of curvature wrt overfitting? AFAICT neither the XGBoost documentation nor its original paper mention it as a means to combat overfitting.
The way I see it, the use of curvature here is simply similar to a Newton-Rhaphson method of gradient descent. Said differently, we're basically just using a quadratic local approximation to the loss. Said again differently, the curvature information is mostly about determining the step size in gradient descent.
In particular, if doing least-squares regression, then the quadratic approximation becomes exact, which means that the curvature-based method yields no more and no less overfitting than fitting a regression tree with least squares (which is potentially a lot of overfitting).