Correlation Analysis in R?

Correlation Analysis in R?

In correlation analysis, we estimate a sample correlation coefficient based on experimental data, in most cases the Pearson Product Moment correlation coefficient is used to find the relationships.

The sample correlation coefficient denoted r.

The correlation coefficient ranges between -1 and +1 and quantifies the direction and strength of the linear association between the two variables.

The correlation between two variables can be positive (i.e., higher levels of one variable are associated with higher levels of the other) or negative (i.e., higher levels of one variable are associated with lower levels of the other).

The sign of the correlation coefficient indicates the direction of the association.

The magnitude of the correlation coefficient indicates the strength of the association.

KNN Machine Learning Algorithm

For example, a correlation of r = 0.8 suggests a strong, positive association between two variables, whereas a correlation of r = -0.1 suggests a weak, negative association.

A correlation close to zero suggests no linear association between two continuous variables.

To obtain a measure of the relation between X and Y independent of units of measurements.

Karl Pearson in 1890 developed a measure of relationship and it’s called the Karl Pearson correlation coefficient.

The population correlation denoted as ρ and is called a product-moment correlation
coefficient of the correlation coefficient.

To know more details, click here the correlation

Is it a good idea to do a masters in statistics?

Getting Data

dt <- data.frame(a = rnorm(10) , b = rnorm(10), c =  rnorm(10))
head(dt)
          a       b           c           d
1   0.8160959  1.2173900 -0.97793080 -0.757270945
2   0.3974761  1.3211291 -0.00980259  0.656894857
3   0.2899615 -0.7997789 -0.71659935  0.488829146
4   0.6998316  0.1078887  0.99519040 -0.379013931

Measure the correlation between all the variables.

corr(dt)
     a           b           c            d
a  1.00000000  0.6310367 -0.04332633 -0.3316613
b  0.63103666  1.0000000 -0.27076891 -0.1930333
c -0.04332633 -0.2707689  1.00000000  0.1802828
d -0.33166128 -0.1930333  0.18028278  1.0000000

Visualization

Now, will check out how to plot correlation results using sjplot package. Load the package into R

library(sjPlot)
sjp.corr(data)

Here pink color indicates a negative correlation and blue color indicates a positive correlation.

When we are doing correlation analysis significance also important.

Linear Discriminant Analysis in R

How to measure significant correlation analysis in R?

Load below-mentioned package for p-value calculation

library(tidyverse)
library(broom)
dt1 = t(combn(names(dt), 2)) %>%
as_data_frame() %>% 
setNames(c("x", "y"))
dt1
cor_result = dt1 %>%
mutate(results = map2(x, y, ~ cor.test(dt[[.x]], dt[[.y]], method = "pearson")),
         results = map(results, tidy)) %>%
  unnest(results)
cor_result

uses the following function to extract estimate and p-value

cor_result %>% select(x, y, estimate, p.value) %>% filter(p.value < 0.5)
x y estimate p.value
<chr> <chr> <dbl> <dbl>
1 a b 0.631 0.0504
2 a d -0.332 0.349 
3 b c -0.271 0.449

t-test in R

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

7 − five =