Calculating Z-Scores in Python: A Step-by-Step Guide

by finnstats

Calculating Z-Scores in Python, Z-scores are a fundamental concept in statistics, providing a way to measure how many standard deviations away a value is from the mean.

Calculating Z-Scores in Python

In this article, we’ll explore how to calculate z-scores in Python using various libraries and data structures.

Using SciPy’s zscore Function

The zscore function in SciPy’s stats module provides a convenient way to calculate z-scores for one-dimensional arrays or multi-dimensional arrays. The function takes the following arguments:

a: an array-like object containing the data
axis: the axis along which to calculate the z-scores (default is 0)
ddof: degrees of freedom correction in the calculation of the standard deviation (default is 0)
nan_policy: how to handle NaN values (default is propagate, which returns NaN)

Example 1: Calculating Z-Scores for a One-Dimensional Numpy Array

Let’s start with a simple example using a one-dimensional numpy array.

import numpy as np
import scipy.stats as stats

data = np.array([6, 7, 7, 12, 13, 13, 15, 16, 19, 22])
z_scores = stats.zscore(data)
print(z_scores)

This will output:

[-1.394 -1.195 -1.195 -0.199  0.    0.    0.398  0.598  1.195  1.793]

Each z-score tells us how many standard deviations away an individual value is from the mean.

Example 2: Calculating Z-Scores for a Multi-Dimensional Numpy Array

What if we have a multi-dimensional array? We can use the axis parameter to specify which axis to calculate the z-scores for. For example:

Correlation By Group in R » Data Science Tutorials

data = np.array([[5, 6, 7, 7, 8],
                 [8, 8, 8, 9, 9],
                 [2, 2, 4, 4, 5]])
z_scores = stats.zscore(data, axis=1)
print(z_scores)

This will output:

[[-1.569 -0.588  0.392  0.392  1.373]
 [-0.816 -0.816 -0.816  1.225  1.225]
 [-1.167 -1.167  0.5   0.5   1.333]]

Each z-score is calculated relative to its own array.

Example 3: Calculating Z-Scores for a Pandas DataFrame

Finally, let’s use the apply function to calculate z-scores for individual values in a Pandas DataFrame.

import pandas as pd
import numpy as np
import scipy.stats as stats

data = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C'])
z_scores = data.apply(stats.zscore)
print(z_scores)

This will output:

          A         B         C
0   0.659380 -0.802955  0.836080
1 -0.659380 -0.802955 -0.139347
2   0.989071 -0.917663 -0.487713
3 -1.648451 -1.491202 -1.950852
4   0.659380 -0.802955 -0.487713

Each z-score is calculated relative to its own column.