Gradient Boosting

Gradient boosting is one of popular boosting technique, especially with decision trees, for regression and classification problems. Like other boosting methods, gradient boosting adds predictors sequentially to an ensemble that each predictor corrects its predecessor. The main difference from other methods like AdaBoost is that it trains a new predictor to fit the residual errors made by the previous predictor. The residual error is the difference between a predicted value and its real value.

Let’s take a look at a regression example using decision trees as the base predictors, which is called gradient tree boosting.

tree1 = DecisionTreeRegressor(max_depth=10)
tree1.fit(X, y)

First of all, it trains a decision tree with a training data X that has a feature size_in_sq_feet and the target value y that is the list of house prices. This first predictor tries to predict the target house prices.

y2 = y - tree1.predict(X)
tree2 = DecisionTreeRegressor(max_depth=10)
tree2.fit(X, y2)

The second predictor tree2 is trained to predict the residual errors y2 made by the first predictor tree1.

y3 = y2 - tree2.predict(X)
tree3 = DecisionTreeRegressor(max_depth=10)
tree3.fit(X, y3)

The third predictor tree3 is trained on the residual errors y3 made by the second predictor tree2. Gradient boosting adds new predictors sequentially as the above. It keeps working on the residual errors.

It is simple to make a prediction on a new instance by summing the predictions of all the trees as follows:

ans = 0
for tree in [tree1, tree2, tree3]:
     ans += tree.predict(x_new) 

return ans

Given a new instance, in which the value of size_in_sq_feet is 2,000 and the real house price is $300,000, the first predictor tree1 might predict that the house price as $250,000. The second one tree2 might output $40,000, and the third one might predict $5,000. The final prediction of the ensemble would be $250,000 + $40,000 + $5,000 = $295,000 that is close to the real house price.

Posts

Gradient Descent

Batch vs. Online Learning

Knowledge Graph Completion

What are Embeddings?

Batch Normalization

The Normal Equation

Handling Categorical Values

How to Deal with Missing Values

Underfitting vs. Overfitting

Gradient Boosting