XGBoost’s assumptions, First will provide an overview of the algorithm before we dive into XGBoost’s assumptions.
Extreme Gradient Boosting, often known as XGBoost, is a supervised learning technique that belongs to the family of machine learning algorithms known as gradient-boosted decision trees (GBDT).
Boosting to XGBoost
To lower the number of training errors, boosting is the process of fusing a group of weak learners into strong learners.
Boosting makes it more efficient by addressing the bias-variance trade-off.
Various boosting algorithms exist, including XGBoost, Gradient Boosting, AdaBoost (Adaptive Boosting), and others.
Let’s now enter XGBoost.
As previously mentioned, XGBoost is a gradient-boosted decision tree (GBM) extension that is renowned for its speed and performance.
Combining a number of simpler, weaker models of decision trees that are built sequentially allows for the creation of predictions.
These models evaluate other decision trees using if-then-else true/false feature questions in order to provide predictions about the likelihood of obtaining a sound choice.
These three things make up it:
- an optimization target for a loss function.
- a poor predictor of the future.
- Adding an additive model will help the weaker models make fewer mistakes.
Features of XGBoost
There are 3 features of XGBoost:
1. Gradient Tree Boosting
The tree ensemble model must undergo additive training. Hence, decision trees are added one step at a time in a sequential and iterative procedure.
A fixed number of trees are added, and the loss function value should decrease with each iteration.
2. Regularized Learning
Regularized Learning helps to balance out the final learned weight by reducing the loss function and preventing overfitting or underfitting.
3. Shrinkage and Feature Subsampling
These two methods help prevent overfitting even further.
Shrinkage lessens the degree to which each tree influences the model as a whole and creates space for potential future tree improvements.
Feature You may have seen subsampling in the Random Forest algorithm. In addition to preventing overfitting, the characteristics in the column segment of the data also speed up the parallel algorithm’s concurrent computations.
import xgboost as xgb
Four groups of XGBoost hyperparameters are distinguished:
- General parameters
- Booster parameters
- Learning task parameters
- Command line parameters
Before starting the XGBoost model, general parameters, booster parameters, and task parameters are set. Only the console version of XGBoost uses the command line parameters.
Overfitting is a simple consequence of improper parameter tuning. However, it is challenging to adjust the XGBoost model’s parameters.
What assumptions Underlie XGBoost?
The XGBoost’s major assumptions are:
It’s possible for XGBoost to presume that each input variable’s encoded integer values have an ordinal relationship.
XGBoost believes your data might not be accurate (i.e. it can deal with missing values)
The algorithm can tolerate missing values by default because it DOES NOT ASSUME that all values are present.
Missing values are learned during the training phase when using tree-based algorithms. This then results in the following:
Sparsity is handled via XGBoost.
Categorical variables must be transformed into numeric variables because XGBoost only manages numeric vectors.
A dense data frame with few zeroes in the matrix must be transformed into a very sparse matrix with many zeroes.
This means that variables can be fed into XGBoost in the form of a sparse matrix.
You now know how XGBoost and boosting connect to one another, as well as some of its features and how it lessens overfitting and the loss of function value.
Continue to read and learn…