How to Perform a Z Test in R Language: A Step-by-Step Guide

Introduction: In statistics, a z-test is a hypothesis test that is used to determine whether two population means are significantly different from each other. A z-test is a type of parametric test that is used when certain assumptions about the population are met, such as when the sample size is large enough and the population standard deviation is known. In this how-to guide, we will walk you through the process of performing a z-test in R language.

Step 1: Import Your Data The first step in performing a z-test in R is to import your data into R. This can be done using the read.csv() or read.table() functions. For example, if your data is stored in a CSV file called "data.csv", you can import it using the following code:

data <- read.csv("data.csv")

Step 2: Calculate the Sample Mean and Standard Deviation Once you have imported your data, you need to calculate the sample mean and standard deviation. You can use the mean() and sd() functions to do this, respectively. For example:

sample_mean <- mean(data$column_name) sample_sd <- sd(data$column_name)

Replace "column_name" with the name of the column that contains the data you want to analyze.

Step 3: Set Your Hypotheses and Confidence Level Before you can perform a z-test, you need to set your hypotheses and confidence level. The null hypothesis (H0) is that the population mean is equal to a specific value, while the alternative hypothesis (Ha) is that the population mean is different from that value. The confidence level is the probability that your results are correct. The most common confidence level is 95%.

For example, if you want to test whether the population mean is equal to 10, your hypotheses would be:

H0: mu = 10 Ha: mu != 10

Step 4: Calculate the Test Statistic To calculate the test statistic for the z-test, you can use the following formula:

z <- (sample_mean - hypothesized_mean) / (sample_sd / sqrt(n))

Where "hypothesized_mean" is the value of the population mean specified in the null hypothesis, and "n" is the sample size.

Step 5: Calculate the p-value Once you have calculated the test statistic, you can use it to calculate the p-value. The p-value is the probability of obtaining a test statistic as extreme as the one you calculated, assuming the null hypothesis is true. You can use the pnorm() function to calculate the p-value. For example:

p_value <- 2 * (1 - pnorm(abs(z)))

The factor of 2 is included because this is a two-tailed test.

Step 6: Compare the p-value to the Significance Level The final step in performing a z-test is to compare the p-value to the significance level. If the p-value is less than the significance level (usually 0.05), then you can reject the null hypothesis and conclude that the population mean is significantly different from the hypothesized value. Otherwise, you fail to reject the null hypothesis.

Code Example:

# Step 1: Import Your Data data <- read.csv("data.csv") # Step 2: Calculate the Sample Mean and Standard Deviation sample_mean <- mean(data$column_name) sample_sd <- sd(data$column_name) # Step 3

Step 4: Calculate the test statistic

The test statistic for the z-test is calculated as:

z = (x̄ - μ) / (σ / √n)

where x̄ is the sample mean, μ is the population mean, σ is the population standard deviation, and n is the sample size.

In R, we can use the following formula to calculate the test statistic:

z <- (x_bar - mu) / (sigma / sqrt(n))

Here's an example using the same data as before:

x <- c(22.5, 21.8, 23.2, 24.1, 21.5, 22.3) n <- length(x) x_bar <- mean(x) sigma <- 1.2 mu <- 20 z <- (x_bar - mu) / (sigma / sqrt(n)) z

Output:

csharp

[1] 5.161355

Step 5: Determine the p-value

Once we have calculated the test statistic, we need to determine the probability (p-value) of observing a value as extreme or more extreme than the calculated test statistic, assuming the null hypothesis is true. In other words, we want to know the probability of observing a z-value as large or larger than our calculated z-value.

In this case, we are performing a two-tailed test, so we need to find the probability of observing a z-value as large or larger than our calculated z-value (5.161355) or as small or smaller than -5.161355. We can use the pnorm() function in R to find these probabilities:

p_value <- 2 * (1 - pnorm(abs(z))) p_value

Output:

csharp

[1] 2.46113e-07

The p-value is less than 0.05, which means that we can reject the null hypothesis and conclude that the population mean is significantly different from 20.

Step 6: Interpret the results

Based on the analysis, we can conclude that there is sufficient evidence to support the claim that the population mean is different from 20. The test statistic (z-value) is 5.161355 and the p-value is 2.46113e-07, which is less than the significance level of 0.05. Therefore, we reject the null hypothesis and accept the alternative hypothesis that the population mean is different from 20.

Conclusion

In this guide, we have learned how to perform a z-test in R to test the hypothesis about the population mean. We have walked through the steps involved in conducting a z-test, including defining the null and alternative hypotheses, specifying the level of significance, calculating the test statistic, determining the p-value, and interpreting the results.

We have also provided code examples to demonstrate how to perform a z-test in R. By following the steps outlined in this guide, you can confidently perform a z-test in R and draw conclusions about the population mean based on your analysis.

JBI Training

We offer a number of options for training in R programming language - OR to make a training request get in touch.

Courses

R - Reporting & Dashboards with Shiny training course

R training course

R with RMarkdown and Quarto training course

R for Life Science Researchers training course

About the author: Daniel West

Tech Blogger & Researcher for JBI Training