Logistic Regression

an illustration of logistic regression

With the previous introduction to classification analysis, we can now discuss more about classification models. Since we have had some exposure to logistic regression last time, let us continue with its story. Logistic regression is pretty much the linear model of classification, both because characteristically, it is indeed linear, and also because it has similar complexity to linear regression. So, let us not wait further, and jump right in!

Logistic regression

First, let us recall, the equation of a simple linear model is y = ax + b, with y being the target, x being the single feature, and a and b are coefficient and intercept, respectively. This equation gives us the straight line pattern in a 2-dimensional visualization as in the left figure below. Furthermore, y is unbounded in linear regression. Previously, we have seen that logistic regression prediction patterns form an S curve like in the right figure below.

an illustration of a linear model prediction

Prediction of a linear regression model

an illustration of logistic model prediction pattern

Prediction of a logistic regression model

So where does that S curve come from? We just wrap the ax + b part in a function called Sigmoid. Furthermore, we have known that logistic regression predicts the probability of an instance belong to the positive class, which is denoted as P(y=1) in general. Overall, we obtain the equation below for single-feature logistic regression.

P(y=1) = \dfrac{1}{1 + e^{-(ax + b)}}

You can easily verify that, with very extreme values of x, (ax + b) gets either very positive or very negative, which makes e-(ax + b) very close to 0 or very large, and in turns, P(y=1) approaches but never passes 1 or 0. The exponential function also makes the pattern curvy like in the figure. Finally, with more features, we just add more coefficients, still inside the Sigmoid function. The general equation of logistic regression is as

P(y=1) = \dfrac{1}{1 + e^{-(a_0 + a_1x_1 + a_2x_2 + ... + a_kx_k)}}

Regularized logistic regression

Like linear models, logistic models without any constrains can overfit data and learn fake patterns. To avoid this, we also regularize them. At a high level, logistic models are trained by minimizing training classification error, and we regularize them also by adding penalty terms:

minimize    training\_error + \alpha*penalty

Again, exactly like linear models, there are three types of penalties, sum of squared coefficients, sum of absolute coefficients, and a mixture of the two. In terms of name, we call them L2 regularization, L1 regularization, and elastic-net regularization, respectively. In terms of behaviors, these three are just like ridge regression, lasso, and elastic-nets, but for logistic models.

Demonstration

Loading data and preliminary analysis

You can access the complete notebook in my GitHub. I will be using the heart_disease.csv data set which is originally from Kaggle. It consists of data from 918 patients including demography and some medical measurements. The target column is HeartDisease which indicates whether the patients had heart failure or not. After loading data, an info() shows no issues with missing values or data types.

Next, we perform train test splitting then investigate histograms and bar charts. At a quick look, there are not any serious issues. However, upon investigating closer, we can see some 0 in Cholesterol and RestingBP. These number do not make sense medically, so I will turn them into missing values then impute with column medians.

Processing

We only detect an issue with 0 in Cholesterol and RestingBP, so my pipeline is as follow
– Numeric columns: 1) remove 0 from Cholesterol and RestingBP => 2) impute with medina => 3) standardization
– Categorical columns: one hot encoding
Overall, the pipeline is as follows.

Modeling

L2 regularized logistic regression

First, let us try the L2 regularized model. We simply add penalty='l2' to have this model. Remember that this method is similar to Ridge regression, therefore we need to finetune the strength parameter, which is now C. Other than that, the code is pretty much the same with creating a parameter grid then creating and fitting a grid search. After fitting, we can obtain the selected C with best_params_, which is 0.05. The training CV accuracy is 85.56% from best_score_ and testing accuracy 88.04% by calling score with the testing features and label.

L1 logistic regression

If L2 regularization is similar to Ridge, then L1 Lasso. And that pretty much it to remember about this model. In terms of using, we change penalty to l1 and add solver='liblinear' since it is required by SKLearn. The rest are the same as the L2 model. In this case, our L1 model gets a training CV accuracy of 85.97% and testing accuracy 85.87%.

Elastic-net logistic regression

Finally, the elastic-net logistic regression is the same with its counterpart in linear models. It uses a mixture of L1 and L2 penalty, which means we also needs to tune the l1_ratio parameter. To use this method, we set penalty='enet' and solver='saga'. This one gets a training CV accuracy of 86.24% and testing accuracy of 85.87%.

Conclusion

As you can see after the tests, the three types of regularized logistic regression behave pretty similar in terms of performance. So, just use them by your preferences, unless you want absolute performance then try finetuning all three. Anyway, I hope you have had a good understanding about Logistic regression after this post. This is probably my longest post so far, so it is time to stop. See you again next time!

1 Comment

Comments are closed