Area Under Curve in R (AUC)

Area Under Curve in R, when the response variable is binary, we utilize logistic regression as a statistical method to fit a regression model.

Area Under Curve in R

The following two metrics can be used to determine how well a logistic regression model fits a dataset.

Sensitivity: The likelihood that the model correctly predicts a positive result for observation when the result is positive. The “real positive rate” is another term for this.

Specificity: Refers to the likelihood that the model correctly predicts a negative consequence for an observation. The “real negative rate” is another name for this.

Making a ROC curve, which stands for the “receiver operating characteristic” curve, is one technique to illustrate these two measures.

How to Calculate Mahalanobis Distance in R » finnstats

The sensitivity is plotted on the y-axis, while (1 – specificity) is plotted on the x-axis. Calculating AUC, or “area under the curve,” is one approach to measure how well the logistic regression model works at classifying data.

The model is better if the AUC is close to 1.

The example below explains how to calculate AUC in R for a logistic regression model step by step.

Step 1: The first step is to load the data

We’ll start by loading the Default dataset from the ISLR package, which contains data on whether or not specific individuals have defaulted on a loan.

loading the dataset

#install.packages("ISLR")
library(ISLR)
df <- ISLR::Default
head(df)

Now we can view the first six rows of the dataset

  default student   balance    income
1      No      No  729.5265 44361.625
2      No     Yes  817.1804 12106.135
3      No      No 1073.5492 31767.139
4      No      No  529.2506 35704.494
5      No      No  785.6559 38463.496
6      No     Yes  919.5885  7491.559

Step 2: Fit the Logistic Regression Model

After that, we’ll use a logistic regression model to forecast the likelihood of an individual defaulting.

Index Names and lapply Function in R » finnstats

create a reproducible example

set.seed(123)

70 percent of the dataset should be used as a training set, while the remaining 30% should be used as a testing set.

sample <- sample(c(TRUE, FALSE), nrow(df), replace=TRUE, prob=c(0.7,0.3))
train <- df[sample, ]
test <- df[!sample, ]

Fit logistic regression model

model <- glm(default~student+balance+income, family="binomial", data=train)
model
Call:  glm(formula = default ~ student + balance + income, family = "binomial",
    data = train)
Coefficients:
(Intercept)   studentYes      balance       income 
 -1.106e+01   -7.296e-01    5.940e-03   -2.305e-06 
Degrees of Freedom: 7047 Total (i.e. Null);  7044 Residual
Null Deviance:        2013
Residual Deviance: 1043               AIC: 1051

Step 3: Determine the Model’s AUC.

The AUC of the model will then be calculated using the auc() function from the pROC package. The syntax for this function is as follows.

auc(response, predicted)

In our example, here’s how to use this function:

for each participant in the test dataset, calculate the probability of default.

Logistic Regression R- Tutorial » Detailed Overview » finnstats

predicted <- predict(model, test, type="response")

Now we can calculate the AUC

library(pROC)
auc(test$default, predicted)
Area under the curve: 0.9361

Because this score is close to 1, it means that the model is highly good at predicting whether or not a person would fail on their loan.

If you like the article please share.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

nineteen − nineteen =