Logistic Regression in Machine Learning

June 5, 2025

Imagine you want to know if someone will click on an ad or not. You have some basic info like age income or search history. Logistic regression takes that info and gives you a number like 0.73 which means there is a 73 percent chance they will click. If the number is more than 0.5 we say yes if not then no. This is how machines make decisions in a smart way without being too complex.

In this blog we will explain logistic regression in a simple way we will talk about how it works, types of logistic regression in machine learning, some math behind it, examples in real life and how you can use it in Python. No heavy math just clear and easy to understand logic.

Table of Contents

What is Logistic Regression?

Logistic regression is a basic machine learning algorithm that is used when we want to predict something that has only two outcomes like yes or no true or false buy or not buy. It looks at the data we already have and tries to find a pattern so it can guess what might happen next. For example if you give it details like a person’s age income and past purchases it can tell you if that person is likely to buy your product or not.

But instead of just saying yes or no right away it gives you a number between 0 and 1 which tells how likely it is to be a yes. If the number is more than 0.5 we take it as a yes and if it is less we take it as a no. This number is called probability and it helps us make better decisions.

Even though it has the word regression in its name it is mostly used for classification problems which means sorting things into groups like spam or not spam or pass or fail. It is easy to use works well for simple problems and is often the first algorithm people learn in machine learning.

Maths Behind Logistic Regression

In logistic regression we do not use a straight line like in linear regression because we are not predicting numbers we are predicting chances. So instead of using a normal line we use something called the sigmoid function. This function takes any number and turns it into a value between 0 and 1. That value shows the chance of something happening.

Let us say you give some input like age or salary to the model. First it multiplies those inputs with some weights and adds them up. This is just basic math like 5 times age plus 3 times salary. The result is just a number which can be big or small. Then this number is passed through the sigmoid function. The sigmoid changes it into a value between 0 and 1. For example it might change 2.4 into 0.91 which means there is a 91 percent chance of getting a yes.

The math also uses something called a cost function which checks how far the prediction is from the actual answer. If the model is wrong the cost will be high. So it keeps adjusting the weights again and again using a method called gradient descent until the cost becomes as low as possible. This is how the model learns from the data and gets better at making predictions.

Types of Logistic Regression

Type	Number of Classes	Output Type	Real-life Example	When to Use
Binary Logistic Regression	2	Yes/No or 0/1	Spam or Not Spam	When your outcome has only two categories
Multinomial Logistic Regression	3 or more (no order)	One class from many	Predicting if a person chooses Apple Banana or Orange	When you have more than two options without any ranking
Ordinal Logistic Regression	3 or more (ordered)	One ranked category	Customer satisfaction: Poor Fair Good Excellent	When categories have a natural order or level

Assumption of Logistic Regression

Logistic regression is like teaching your computer how to guess something that has a yes or no answer. For example it can learn if a person has cancer or not based on some test results. It takes some data learns from it and then makes predictions. The more data it sees the better it gets at guessing. It is one of the easiest and most useful tools when you want to predict simple choices like true or false or yes or no.

import pandas as pd
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm

# Sample data
data = {
    'X1': [1, 2, 3, 4, 5, 6],
    'X2': [2, 4, 6, 8, 10, 12],
    'X3': [5, 3, 6, 9, 12, 15],
    'y':  [0, 0, 0, 1, 1, 1]
}

df = pd.DataFrame(data)
X = df[['X1', 'X2', 'X3']]
X = sm.add_constant(X)

vif = pd.DataFrame()
vif['Feature'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print(vif)

How Logistic Regression Works?

Step	What Happens	Explanation in Simple Words
1	Take input features	Use data like age income marks etc
2	Multiply inputs by weights and add bias	Do basic math like 5 times age plus 2 times income plus a small extra number
3	Apply the sigmoid function	Turn the result into a value between 0 and 1
4	Get the probability	This number shows the chance of saying yes or no
5	Use a threshold (like 0.5)	If the chance is more than 0.5 say yes else say no
6	Compare prediction with actual result (calculate error)	Check if the model was right or wrong
7	Update weights using gradient descent	Adjust the math so the next guess is better
8	Repeat the process till the error becomes small	Keep improving till the model becomes good enough

Evaluation Metrics For Logistic Regression

There are some evaluation metrics which we are taking care for the logistic regression:

Accuracy
Percentage of correctly predicted samples out of total samples.
Precision
Out of all predicted positives, how many are actually positive.
Formula: Precision = TP / (TP + FP)
Recall (Sensitivity)
Out of all actual positives, how many were correctly predicted.
Formula: Recall = TP / (TP + FN)
F1 Score
Harmonic mean of precision and recall, balances both.
Formula: F1 = 2 * (Precision * Recall) / (Precision + Recall)
ROC-AUC (Receiver Operating Characteristic – Area Under Curve)
Measures the model’s ability to distinguish classes; higher is better.
Log Loss (Cross-Entropy Loss)
Measures how well the predicted probabilities match actual labels and also lower is better.

Advantages of Logistic Regression

Easy to implement and understand.
Works well for binary classification problem
Provides probabilities for class predictions.
Can handle both continuous and categorical variables.
Requires less computation compared to complex models.

Disadvantages of Logistic Regression

Can only handle linear relationships between features and log-odds.
Not suitable for complex or non-linear problems.
Sensitive to outliers and noisy data.
Assumes no multicollinearity among input features.
Can struggle with large number of features without proper regularization.

Applications of Logistic Regression

Predicting whether a patient has a specific disease based on medical data.
Assessing the likelihood that a loan applicant will default on payment.
Forecasting if a customer will buy a product or respond to a campaign.
Classifying emails accurately as spam or legitimate messages.
Predicting if customers are likely to stop using a service or product.

Conclusion

Logistic regression is a powerful and easy-to-use classification method that helps predict binary outcomes. It works well when the relationship between features and the target is roughly linear on the log-odds scale. Despite some limitations like handling only linear relationships and sensitivity to outliers, it remains widely used in fields like healthcare, finance, and marketing because of its interpretability and efficiency. Understanding its assumptions and evaluation metrics helps build better and reliable models.

Frequently Asked Questions (FAQs)

What is logistic regression used for?

It is used to predict binary outcomes like yes/no or true/false decisions based on input features.

How is logistic regression different from linear regression?

Logistic regression predicts probabilities and class labels for classification problems, while linear regression predicts continuous numeric values.

What are the key assumptions of logistic regression?

The main assumptions are no multicollinearity among features, linearity between features and the log-odds of the outcome, and independent observations.

Logistic Regression in Machine Learning

What is Logistic Regression?

Maths Behind Logistic Regression

Types of Logistic Regression

Assumption of Logistic Regression

How Logistic Regression Works?

Evaluation Metrics For Logistic Regression

Advantages of Logistic Regression

Disadvantages of Logistic Regression

Applications of Logistic Regression

Conclusion

Frequently Asked Questions (FAQs)

What is logistic regression used for?

How is logistic regression different from linear regression?

What are the key assumptions of logistic regression?

Our Courses

After 12th

Popular Category

Editor Picks

Which is the Toughest Section in CAT?

What is Logistic Regression?

Maths Behind Logistic Regression

Types of Logistic Regression

Assumption of Logistic Regression

How Logistic Regression Works?

Evaluation Metrics For Logistic Regression

Advantages of Logistic Regression

Disadvantages of Logistic Regression

Applications of Logistic Regression

Conclusion

Frequently Asked Questions (FAQs)

What is logistic regression used for?

How is logistic regression different from linear regression?

What are the key assumptions of logistic regression?

RELATED ARTICLESMORE FROM AUTHOR

Time Series Analysis and Forecasting in Detail

Applications of Time Series Analysis in 2025

Data Science Syllabus 2025: Topics, Tools, and Career Roadmap

Our Courses

After 12th

Popular Category

Editor Picks

Which is the Toughest Section in CAT?

RELATED ARTICLES MORE FROM AUTHOR