One proportion Z Test in R
One proportion Z Test in R, One proportion Z-Test is a statistical test that is used to determine whether the difference between observed and expected frequencies for a categorical variable is significant or due to chance.
It is a hypothesis-testing method that helps researchers make inferences about a population based on a sample. In this article, we will discuss how to perform a one-proportion Z-Test in R.
Formulation of the Hypothesis:
Before performing a one-proportion Z-Test, it is necessary to formulate the null and alternative hypotheses.
The null hypothesis (H0) assumes that there is no significant difference between the observed and expected frequencies for a categorical variable.
It is usually written as:
H0: p = p0
where p is the proportion of the sample with a particular characteristic and p0 is the hypothesized proportion.
The alternative hypothesis (H1) assumes that there is a significant difference between the observed and expected frequencies for a categorical variable. It can be either one-tailed or two-tailed and is usually written as:
H1: p ≠ p0 (two-tailed) H1: p > p0 (one-tailed) H1: p < p0 (one-tailed)
In the following sections, we will provide examples of how to perform a one-proportion Z-Test in R.
Example 1: One-Tailed Z-Test
In this example, we will use a dataset that contains information about 1000 people and whether or not they have a specific disease.
We want to test the hypothesis that the proportion of people with the disease is greater than 10% using a one-tailed Z-Test.
First, we need to load the dataset:
disease_data <- read.csv("disease_data.csv")
Next, we can calculate the proportion of people with the disease:
n_total <- nrow(disease_data) n_disease <- sum(disease_data$disease == "Yes") p_disease <- n_disease / n_total
Then, we can specify the null and alternative hypotheses:
p0 <- 0.1 H0 <- paste0("p =", p0) H1 <- paste0("p >", p0)
We can now conduct the one-tailed Z-Test using the ‘prop.test’ function:
z_test <- prop.test(n_disease, n_total, p = p0, alternative = "greater")
Finally, we can extract the test statistic, critical value, and p-value from the Z-Test output using the ‘summary’ function:
summary(z_test)
The output will display the test statistic, the critical value, the p-value, and a conclusion based on the test results.
In this case, because the p-value is less than 0.05, we reject the null hypothesis and conclude that the proportion of people with the disease is significantly higher than 10%.
Applications of Data Science in Education » Data Science Tutorials
Example 2: Two-Tailed Z-Test
In this example, we will use a dataset that contains information about 1000 people and whether or not they have a specific gene variant.
We want to test the hypothesis that the proportion of people with the gene variant is not equal to 15% using a two-tailed Z-Test.
Let’s load the dataset:
gene_data <- read.csv("gene_data.csv")
Next, we can calculate the proportion of people with the gene variant:
n_total <- nrow(gene_data) n_variant <- sum(gene_data$variant == "Yes") p_variant <- n_variant / n_total
Then, we can specify the null and alternative hypotheses:
p0 <- 0.15 H0 <- paste0("p =", p0) H1 <- paste0("p ≠", p0)
We can now conduct the two-tailed Z-Test using the ‘prop.test’ function:
z_test <- prop.test(n_variant, n_total, p = p0, alternative = "two.sided")
Finally, we can extract the test statistic, critical value, and p-value from the Z-Test output using the ‘summary’ function:
summary(z_test)
The output will display the test statistic, the critical value, the p-value, and a conclusion based on the test results.
In this case, because the p-value is greater than 0.05, we fail to reject the null hypothesis and conclude that there is no evidence of a significant difference between the observed and expected frequencies of the gene variant.
Example 3: Conducting Z-Test using Manual Calculation
In this example, we will provide a manual calculation for a one-tailed Z-Test. We will use the same dataset as Example 1.
First, we need to calculate the standard error of the proportion:
se <- sqrt(p_disease * (1 - p_disease) / n_total)
Next, we can calculate the test statistic:
z <- (p_disease - p0) / se
Finally, we can calculate the p-value using the ‘pnorm’ function:
p_value <- 1 - pnorm(z)
The p-value will be the same as the one calculated in Example 1.
Conclusion:
In this article, we have demonstrated how to perform a one proportion Z-Test in R using both the ‘prop.test’ function and manual calculation.
The one proportion Z-Test is a hypothesis testing method that helps researchers make inferences about a population based on a sample.
By utilizing the examples provided in this article, researchers can use the one proportion Z-Test to test hypotheses related to proportions in their datasets.