Area Under Curve in R (AUC)
Area Under Curve in R, when the response variable is binary, we utilize logistic regression as a statistical method to fit a regression model.
Area Under Curve in R
The following two metrics can be used to determine how well a logistic regression model fits a dataset.
Sensitivity: The likelihood that the model correctly predicts a positive result for observation when the result is positive. The “real positive rate” is another term for this.
Specificity: Refers to the likelihood that the model correctly predicts a negative consequence for an observation. The “real negative rate” is another name for this.
Making a ROC curve, which stands for the “receiver operating characteristic” curve, is one technique to illustrate these two measures.
How to Calculate Mahalanobis Distance in R » finnstats
The sensitivity is plotted on the y-axis, while (1 – specificity) is plotted on the x-axis. Calculating AUC, or “area under the curve,” is one approach to measure how well the logistic regression model works at classifying data.
The model is better if the AUC is close to 1.
The example below explains how to calculate AUC in R for a logistic regression model step by step.
Step 1: The first step is to load the data
We’ll start by loading the Default dataset from the ISLR package, which contains data on whether or not specific individuals have defaulted on a loan.
loading the dataset
#install.packages("ISLR") library(ISLR) df <- ISLR::Default head(df)
Now we can view the first six rows of the dataset
default student balance income 1 No No 729.5265 44361.625 2 No Yes 817.1804 12106.135 3 No No 1073.5492 31767.139 4 No No 529.2506 35704.494 5 No No 785.6559 38463.496 6 No Yes 919.5885 7491.559
Step 2: Fit the Logistic Regression Model
After that, we’ll use a logistic regression model to forecast the likelihood of an individual defaulting.
Index Names and lapply Function in R » finnstats
create a reproducible example
set.seed(123)
70 percent of the dataset should be used as a training set, while the remaining 30% should be used as a testing set.
sample <- sample(c(TRUE, FALSE), nrow(df), replace=TRUE, prob=c(0.7,0.3)) train <- df[sample, ] test <- df[!sample, ]
Fit logistic regression model
model <- glm(default~student+balance+income, family="binomial", data=train) model
Call: glm(formula = default ~ student + balance + income, family = "binomial", data = train) Coefficients: (Intercept) studentYes balance income -1.106e+01 -7.296e-01 5.940e-03 -2.305e-06 Degrees of Freedom: 7047 Total (i.e. Null); 7044 Residual Null Deviance: 2013 Residual Deviance: 1043 AIC: 1051
Step 3: Determine the Model’s AUC.
The AUC of the model will then be calculated using the auc() function from the pROC package. The syntax for this function is as follows.
auc(response, predicted)
In our example, here’s how to use this function:
for each participant in the test dataset, calculate the probability of default.
Logistic Regression R- Tutorial » Detailed Overview » finnstats
predicted <- predict(model, test, type="response")
Now we can calculate the AUC
library(pROC) auc(test$default, predicted) Area under the curve: 0.9361
Because this score is close to 1, it means that the model is highly good at predicting whether or not a person would fail on their loan.
If you like the article please share.