Feature Scaling in Machine Learning-Introduction
Feature Scaling in Machine Learning is a strategy for putting the data’s independent features into a set range. It’s done as part of the data pre-processing.
Given a data set with features like Age, Income, and brand, with a total population of 5000 persons, each with these independent data elements.
The following are the labels for each data point:
• Class 1- YES (means with the given Age, Income, brand feature value one can buy the property)
• Class2- NO (meaning that the property cannot be purchased based on the specified Age, Income, and brand feature value).
Using a dataset to train the model, the goal is to create a model that can predict whether or not a property can be purchased based on certain feature values.
An N-dimensional (where N is the number of features included in the dataset) graph with data points from the given dataset can be constructed once the model has been trained.
The model is depicted in the diagram below in its perfect form.
Feature Scaling in Machine Learning
As illustrated in the diagram, star data points represent Class1 – Yes labels, whereas circles represent Class2 – No labels and these data points are used to train the model.
Now there is a new data point (a diamond in the figure) with varied independent values for the three attributes (Age, Income, and brand) described before.
The model must determine if this data point is in the Yes or No category.
Prediction of the new data point class:
The model determines the distance between this data point and each class group’s centroid.
Finally, this data point will be assigned to the class with the shortest centroid distance.
These approaches can be used to calculate the distance between the centroid and the data point.
Euclidean Distance -The square root of the sum of squares of differences between the coordinates (feature values – Age, Income, brand) of each data point and the centroid of each class is the Euclidean Distance.
where x is the data point value, y is the centroid value, and k is the number of feature values, for example, the following data set contains k = 3 feature values.
Manhattan’s Length: The sum of absolute differences between the coordinates (feature values) of each data point and the centroid of each class is used to calculate it.
Minkowski Distance: This is a combination of the two approaches above.
Feature Scaling is Required: The given data set has three features: Age, Income, and brand. Consider a range of twenty to sixty years old, one to forty thousand dollars in Income, and one to five bedrooms in a flat. All of these characteristics are distinct from one another.
Assume the data point to be predicted is [60, 35Lacs, 3], and the centroid of class 1 is [50, 25Lacs, 3].
Using the Manhattan Technique,
Distance = (|(60 - 50)| + |(2500000 - 3500000)| + |(3 - 3)|)
It can be observed that the Income feature will outperform all other characteristics in predicting the class of a given data point because all of the features are independent of one another,
i.e. a person’s income has nothing to do with his or her age or the type of flat he or she needs.
This implies that the model will always be inaccurate in its predictions.
Feature scaling is a straightforward solution to this problem. Age, Income, and brand will be scaled in a set range, such as [-1, 1] or [0, 1]. Then no feature may take precedence over the others.