Curve fitting in R Quick Guide

Curve fitting in R, In this post, we’ll look at how to use the R programming language to fit a curve to a data frame.

One of the most basic aspects of statistical analysis is curve fitting.

It aids in the identification of patterns and data, as well as the prediction of unknown data using a regression model or function.

Dataframe Visualization:

In order to fit a curve to a data frame in R, we must first display the data using a basic scatter plot.

The plot() function in the R language can be used to construct a basic scatter plot.

Syntax:

plot( df$x, df$y)

where,

df: determines the data frame to be used.

x and y: determines the axis variables.

Curve fitting in R

In R, you’ll frequently need to discover the equation that best fits a curve.

The following example shows how to use the poly() function in R to fit curves to data and how to determine which curve best matches the data.

Step 1: Gather data and visualize it

Let’s start by creating a fictitious dataset and then a scatterplot to show it

make a data frame

df <- data.frame(x=1:15,
                 y=c(2, 10, 20, 25, 20, 10, 19, 15, 19, 10, 6, 20, 31, 31, 40))
df

make an x vs. y scatterplot

plot(df$x, df$y, pch=19, xlab='x', ylab='y')

Step 2: Fit Several Curves

Let’s now fit numerous polynomial regression models to the data and exhibit each model’s curve in a single plot:

up to degree 5 polynomial regression models

fit1 <- lm(y~x, data=df)
fit2 <- lm(y~poly(x,2,raw=TRUE), data=df)
fit3 <- lm(y~poly(x,3,raw=TRUE), data=df)
fit4 <- lm(y~poly(x,4,raw=TRUE), data=df)
fit5 <- lm(y~poly(x,5,raw=TRUE), data=df)

make an x vs. y scatterplot

plot(df$x, df$y, pch=19, xlab='x', ylab='y')

define x-axis values

xaxis <- seq(1, 15, length=15)

add each model’s curve to the plot

lines(xaxis, predict(fit1, data.frame(x=xaxis)), col='green')
lines(xaxis, predict(fit2, data.frame(x=xaxis)), col='red')
lines(xaxis, predict(fit3, data.frame(x=xaxis)), col='blue')
lines(xaxis, predict(fit4, data.frame(x=xaxis)), col='pink')
lines(xaxis, predict(fit5, data.frame(x=xaxis)), col='yellow')

The modified R-squared of each model can be used to determine which curve best matches the data.

This score indicates how much of the variation in the response variable can be explained by the model’s predictor variables, adjusted for the number of predictor variables.

calculated each model’s adjusted R-squared

summary(fit1)$adj.r.squared
summary(fit2)$adj.r.squared
summary(fit3)$adj.r.squared
summary(fit4)$adj.r.squared
summary(fit5)$adj.r.squared

[1] 0.3078264
[1] 0.3414538
[1] 0.7168905
[1] 0.7362472
[1] 0.711871

We can see from the output that the fourth-degree polynomial has the greatest adjusted R-squared, with an adjusted R-squared of 0.7362472.

Step 3: Create a mental image of the final curve

Finally, we can make a scatterplot with the fourth-degree polynomial model’s curve.

Goodness of Fit Test- Jarque-Bera Test in R » finnstats

make an x vs. y scatterplot

plot(df$x, df$y, pch=19, xlab='x', ylab='y')

define x-axis values

xaxis <- seq(1, 15, length=15)

lines(xaxis, predict(fit4, data.frame(x=xaxis)), col=’blue’)

The summary() method can also be used to obtain the equation for this line.

summary(fit4)

Call:

lm(formula = y ~ poly(x, 4, raw = TRUE), data = df)

Residuals:
   Min     1Q Median     3Q    Max
-8.848 -1.974  0.902  2.599  6.836
Coefficients:
                          Estimate Std. Error t value Pr(>|t|) 
(Intercept)             -18.717949  10.779889  -1.736   0.1131 
poly(x, 4, raw = TRUE)1  24.131130   8.679703   2.780   0.0194 *
poly(x, 4, raw = TRUE)2  -4.832890   2.109754  -2.291   0.0450 *
poly(x, 4, raw = TRUE)3   0.354756   0.195173   1.818   0.0992 .
poly(x, 4, raw = TRUE)4  -0.008147   0.006060  -1.344   0.2085 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.283 on 10 degrees of freedom
Multiple R-squared:  0.8116,       Adjusted R-squared:  0.7362
F-statistic: 10.77 on 4 and 10 DF,  p-value: 0.0012

Based on the predictor variables in the model, we can use the above summary to forecast the value of the response variable.