# Linear Discriminant Analysis: A step by step Guide

Linear Discriminant Analysis, we often utilize logistic regression when we have a collection of predictor factors and want to categorize a response variable into one of two classes.

In the following circumstance, we may utilize logistic regression.

We want to use a customer’s credit score and bank balance to predict whether they would eligible on a loan. (Select “Yes” or “No” as the response variable.)

Convolutional Neural Networks » finnstats

## Linear Discriminant Analysis

When there are more than two distinct classes for a response variable, however, we usually choose to employ a method called linear discriminant analysis, or LDA.

Market Basket Analysis in Data Mining » What Goes WIth What » finnstats

For instance, in the following circumstance, we may use LDA

We want to use points per game to forecast if a high school cricket player will be accepted into one of three divisions, like Division 1, Division 2, or Division 3.

Although both LDA and logistic regression models are used for classification, it turns out that LDA is significantly more stable than logistic regression when making predictions for several classes, and is hence the preferable approach to utilize when the response variable has more than two classes.

When compared to logistic regression, LDA performs better when sample sizes are small, making it a favored method to utilize when big samples are unavailable.

Binomial Distribution in R-Quick Guide » finnstats

### How to Construct LDA Models

(1) Each predictor variable’s values are uniformly distributed. That is, a histogram depicting the distribution of values for a given predictor would roughly have a “bell shape.”

(2) The variance of each predictor variable is the same. Because this virtually never happens in real-world data, we scale each variable to have the same mean and variance before building an LDA model.

How to Calculate Phi Coefficient in R » Association » finnstats

LDA then estimates the following values once these assumptions are met:

μk: The average of all kth-class training observations.

σ2: For each of the k classes, the weighted average of the sample variances.

πk: The proportion of the training observations that belong to the kth class.

LDA then plugs these numbers into the formula below, assigning each observation X = x to the class with the highest value produced by the formula:

`Dk(x) = x * (k/2) – (k2/22) + log(k) Dk(x) = x * (k/2) – (k2/22)`

The name LDA comes from the fact that the value produced by the function above is the result of linear functions of x.

Types of Data Visualization Charts » Advantages» finnstats

#### How to Get Data Ready for LDA

Before using an LDA model on your data, make sure it fits the following criteria:

1. There is a categorical response variable. LDA models are intended for use in classification problems, in which the response variable can be classified into groups or categories.

2. The predictor variables are distributed normally. First, make sure that each predictor variable has a normal distribution. If this is not the case, you may want to alter the data first to normalize the distribution.

3. The variance of each predictor variable is the same. LDA presupposes that each predictor variable has the same variance, as previously stated.

Because this is uncommon in practice, it’s a good idea to scale each variable in the dataset to have a mean of 0 and a standard deviation of 1.

What Data Science Is and What You Can Do With It » finnstats

4. Take into account severe outliers. Before using LDA, make sure the dataset is free of extreme outliers. Using boxplots or scatterplots, you can usually visually check for outliers.