Two Sample Proportions test in R-Complete Guide

by finnstats

Two Sample Proportions test in R, To compare two observed proportions, the two-proportions z-test is utilized.

This article explains the fundamentals of the two-proportions *z-test and gives practical examples using R software.

We have two groups of people, for example:

Best GGPlot Themes You Should Know – Data Science Tutorials

n = 500 in Group A with lung cancer.

Healthy people (Group B): n = 500

The number of smokers in each group is as follows:

n = 500, 450 smokers, pA=450/500=0.9 in Group A with lung cancer.

Individuals in Group B, who are in good health: pB=400/500=0.8

The overall proportion of smokers is p=frac(450+400)500+500

The overall proportion of non-smokers is q=1−p

We’d like to know if the proportions of smokers in the two categories of people are the same.

One sample proportion test in R-Complete Guide (datasciencetut.com)

The following are examples of typical research questions:

whether the proportion of smokers in group A (pA) is the same as the proportion of smokers in group B (pB)?
whether the observed proportion of smokers in group A (pA) is lower than that in group B (pB)?
whether the proportion of smokers in group A (pA) is higher than the proportion of smokers in group B (pB)?

In statistics, the appropriate null hypothesis (H0) is defined as follows:

H0:pA=pB
H0:pA≤pB
H0:pA≥pB

The following are the relevant alternative hypothesis (Ha):

Ha:pA≠pB (different)
Ha:pA>pB (greater)
Ha:pA<pB (less)

Note that:

Two-tailed tests are used to test hypotheses 1.

One-tailed tests are used to test hypotheses 2 and 3.

Best online course for R programming – Data Science Tutorials

The overall proportions are p and q.

If |z| is less than 1.96, the difference is not significant at 5%.

If |z| is greater than or equal to 1.96, the difference is significant at 5%.

The z-table contains the corresponding significance level (p-value) for the z-statistic. We’ll look at how to do it in R.

Two Sample Proportions test in R

R functions: prop.test()

prop.test(x, n, p = NULL, alternative = "two.sided", correct = TRUE)

x: a vector of counts of successes

n: a vector of count trials

alternative: an alternative hypothesis specified as a character string

correct: a logical indication of whether or not Yates’ continuity correction should be used when it is possible

It’s worth noting that the function prop.test() uses the Yates continuity correction by default, which is critical if either the expected successes or failures are less than 5.

Calculate the p-Value from Z-Score in R – Data Science Tutorials

If you don’t want the correction, use the prop.test() function’s additional argument correct = FALSE. TRUE is the default value.

(To make the test mathematically comparable to the uncorrected z-test of a proportion, set this option to FALSE.)

We’d like to know if the proportions of smokers in the two categories of people are the same.

res <- prop.test(x = c(450, 400), n = c(500, 500))
res

2-sample test for equality of proportions with continuity correction
data:  c(450, 400) out of c(500, 500)
X-squared = 18.831, df = 1, p-value = 1.428e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 0.05417387 0.14582613
sample estimates:
prop 1 prop 2
   0.9    0.8

The following is what the function returns:

Pearson’s chi-squared test statistic’s value

95 percent confidence intervals and a p-value

a calculated chance of success (the proportion of smokers in the two groups)

Take note of the following:

Type this to see if the observed proportion of smokers in group A (pA) is less than the observed proportion of smokers in group B (pB).

prop.test(x = c(490, 400), n = c(500, 500), alternative = "less")

Alternatively, type this to see if the observed proportion of smokers in group A (pA) is greater than the observed proportion of smokers in group B (pB).

Control Chart in Quality Control-Quick Guide – Data Science Tutorials

prop.test(x = c(450, 400), n = c(500, 500), alternative = "greater")

The result’s interpretation

The test’s p-value is 1.428e-05, which is less than the alpha = 0.05 significance level. With a p-value of 1.428e-05, we may conclude that the proportion of smokers in the two groups is significantly different.