Bias Variance Tradeoff Machine Learning Tutorial
Bias-variance tradeoff machine learning, To assess a model’s performance on a dataset, we must assess how well the model’s predictions match the observed data.
The mean squared error (MSE) is the most often used statistic for regression models, and it is calculated as:
MSE = (1/n)*Σ(yi – f(xi))^2
n: Number of observations in total
yi: The ith observation’s response value
f(xi): The ith observation’s expected response value
The MSE will be less the closer the model predictions are to the observations.
However, we are only interested in the test MSE, which is the MSE obtained when our model is applied to unknown data.
This is because we’re only interested in how the model will perform on data that hasn’t been seen before, not on data that has already been viewed.
For example, a model that forecasts stock market values with a low MSE on historical data is excellent, but we actually want to be able to use the model to forecast future data properly.
It turns out that the MSE test may be broken down into two parts:
(1) Variance: This refers to how much our function f would change if we estimated it with a different training set.
(2) Bias: The inaccuracy produced by approximating a real-life problem with a much simpler model, which may be exceedingly complicated.
In mathematical terms, this is:
Test MSE = Var(f̂(x0)) + [Bias(f̂(x0))]^2 + Var(ε)
Test MSE = Variance + Bias^2 + Irreducible error
The irreducible error, the third term, is the error that no model can reduce since there is always some noise in the link between the set of explanatory factors and the response variable.
Models with a lot of bias have a lot of volatility.
Linear regression models, for example, have a high bias (assumes a simple linear relationship between explanatory variables and response variables) and a low variance (model estimates don’t vary much from one sample to the next).
Models with little bias, on the other hand, have a lot of variances.
Complex non-linear models, for example, have a minimal bias (since they don’t imply a specific link between explanatory variables and response variables) but large variation (model estimates can change a lot from one training sample to the next).
Bias Variance Tradeoff Machine Learning
When we select between lowering bias, which generally increases variation, and lowering variance, which usually decreases bias, we face the bias-variance tradeoff.
This tradeoff can be visualized using the graph below:
The trade-off between bias and variance
As the complexity of a model grows, the overall error lowers, but only to a point. After a certain point, variance begins to rise, and total error rises with it.
In practice, we are only interested in minimizing a model’s overall error, not necessarily its variance or bias.
It turns out that striking the correct balance between variation and bias is the key to minimizing total error.
To put it another way, we want a model that is complex enough to capture the correct link between the explanatory variables and the response variable, but not so complicated that it discovers patterns that don’t exist.
A model that is excessively complicated overfits the data. This occurs because the algorithm tries too hard to uncover patterns in the training data that aren’t due to chance. On unknown data, this type of model is likely to perform poorly.
A model that is overly simple, on the other hand, underfits the data. This occurs when the genuine link between the explanatory variables and the response variable is assumed to be more straightforward than it is.
In machine learning, the best strategy to choose optimal models is to find a compromise between bias and variance so that the model’s test error on future unknown data is as small as possible.
Cross-validation is the most popular method for reducing test MSE in practice. For more tutorials subscribe to the channel.
Subscribe to our newsletter!