Previously, we have discussed decision trees for classification and some fair details. In this post, we will move to their variant for the regression task. And, yes, we certainly have regression tree models! Long story short, the split mechanism in these trees is pretty much identical to that of classification ones. The main difference lies in how the models derive the predictions in their splits. So, let us wait no longer and get started.
By the way, like the earlier post, we are not focusing on modeling with decision trees in Python and SKLearn yet. The codes that I used for this post is mostly to generate data and visualize trees and their decision boundaries. You can get them here if interested.
Simplest example of regression tree
Like always, we start with the simplest case of a data set with one feature and one target. Here, the task is to predict a test’s grade
based on study time
from several students. The scatter plot for this data is as below. As you can see, there is a positive correlation between study time
and grade
.
Models like linear regression and support vector regressor will simply fit a straight line going through the instances. A decision tree, however, yields a very different prediction pattern as follows. In short, the tree splits the input (study time
) into different regions, each of which is then assigned a flat prediction value. In the scatter plot below, I denoted each region with a different filled color. You can see that the prediction value is flat and equal for each region and only changes at the boundaries. The splits and predictions of the tree are detailed in the tree plot. Besides the predictions which are now numeric, the splits are identical to that of classification trees.
However, similar to their classification counterpart, without any controls, a regression tree will also grow indefinitely and overfit data. Below is the fully-grown tree for the data in this example. It is more complicated than the tree above and basically splits until each instance is in one separate prediction region. The overly complicated prediction pattern suggests that this tree is very likely overfitting the training data.
On the other hand, an overly simple tree is potential to underfit the data. In short, underfitting means the model is not able to learn all the meaningful patterns in data. Below is the example of a tree with only one split. It splits the data into two regions which yield pretty high errors for some instances. Overall, like any other models, it is very important to tune regression trees.
The non-linear case
Now, let us investigate a data set with more non-linearity between the feature and the target. The scatter plot is as below. We can observe a strong curve pattern in the instances. Previously, we would have to use linear models with a quadratic term, or SVR with nonlinear kernel.
A regression tree, however, can fit this data quite easily. For example, the tree below did a fairly good job on capturing the “U-curve” pattern of the instances. The tree is also not overly complicated with the maximum depth of 3
.
It goes without saying, however, without any controls, your tree will grow to overfit any data. The tree below is one of such example. Once again, we see that the tree keeps splitting the input until every instance is in its own region, forming an overly complicated prediction function.
Like other models, regression trees tend to perform better with more data. In the examples below, we can observe that the trees capture the non-linear patterns in data pretty well in denser regions. However, instances in more remote regions tend to have a whole split of their own.
Two-dimensional examples
Let us also investigate some examples with more dimensions (features) in data. The scatter plot below visualizes a data set of two features represented by the two axes, and the target represented by the instances’ colors. The correlation is pretty linear between the target and either feature. This is showed by the consistent color transition as the values in any features increase.
Like before, a tree will attempt to split data in either individual feature. The result of that is a set of rectangle regions with the same prediction values as in the scatter plot below. Here, I use background colors to represent the predictions from the tree. You can see that, each prediction belongs to a rectangle regions in the data visualization.
Finally, let us examine an example where a tree is progressively becoming more complicated. We have not discussed tuning tree models, so I am just using max_depth
for now. Below are illustrations of predictions on the same data made by four different trees, from having max_depth = 2
to fully-grown. The tree with max_depth = 2
is probably too simple and not able to capture the pattern well. The trees with max_depth = 3
and 4
are fairly good. The fully-grown tree, as always, introduces very complicated prediction patterns that are likely overfitting the data.
max_depth = 2
max_depth = 4
max_depth = 3
Fully-grown
Conclusion
In this post, we went through some discussions and illustrations about regression trees. Overall, trees are very flexible models that can model very non-linear patterns in data. However, their flexibilities makes trees very prone to overfitting, and should be tuned carefully. For that reason, in the next post, we will discuss tuning tree models. So, see you next time!