Calculating Z-Scores in Python: A Step-by-Step Guide

by finnstats

Calculating Z-Scores in Python, Z-scores are a fundamental concept in statistics, providing a way to measure how many standard deviations away a value is from the mean.

Calculating Z-Scores in Python

In this article, we’ll explore how to calculate z-scores in Python using various libraries and data structures.

Using SciPy’s zscore Function

The zscore function in SciPy’s stats module provides a convenient way to calculate z-scores for one-dimensional arrays or multi-dimensional arrays. The function takes the following arguments:

a: an array-like object containing the data
axis: the axis along which to calculate the z-scores (default is 0)
ddof: degrees of freedom correction in the calculation of the standard deviation (default is 0)
nan_policy: how to handle NaN values (default is propagate, which returns NaN)

Example 1: Calculating Z-Scores for a One-Dimensional Numpy Array

Let’s start with a simple example using a one-dimensional numpy array.

import numpy as np
import scipy.stats as stats

data = np.array([6, 7, 7, 12, 13, 13, 15, 16, 19, 22])
z_scores = stats.zscore(data)
print(z_scores)

This will output:

[-1.394 -1.195 -1.195 -0.199  0.    0.    0.398  0.598  1.195  1.793]

Each z-score tells us how many standard deviations away an individual value is from the mean.

Example 2: Calculating Z-Scores for a Multi-Dimensional Numpy Array

What if we have a multi-dimensional array? We can use the axis parameter to specify which axis to calculate the z-scores for. For example:

Correlation By Group in R » Data Science Tutorials

data = np.array([[5, 6, 7, 7, 8],
                 [8, 8, 8, 9, 9],
                 [2, 2, 4, 4, 5]])
z_scores = stats.zscore(data, axis=1)
print(z_scores)

This will output:

[[-1.569 -0.588  0.392  0.392  1.373]
 [-0.816 -0.816 -0.816  1.225  1.225]
 [-1.167 -1.167  0.5   0.5   1.333]]

Each z-score is calculated relative to its own array.

Example 3: Calculating Z-Scores for a Pandas DataFrame

Finally, let’s use the apply function to calculate z-scores for individual values in a Pandas DataFrame.

import pandas as pd
import numpy as np
import scipy.stats as stats

data = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C'])
z_scores = data.apply(stats.zscore)
print(z_scores)

This will output:

          A         B         C
0   0.659380 -0.802955  0.836080
1 -0.659380 -0.802955 -0.139347
2   0.989071 -0.917663 -0.487713
3 -1.648451 -1.491202 -1.950852
4   0.659380 -0.802955 -0.487713

Each z-score is calculated relative to its own column.

Conclusion

Calculating z-scores in Python is a straightforward process using SciPy’s zscore function or the apply function in Pandas DataFrames.

By following these examples, you can easily calculate z-scores for your own data and gain valuable insights into your data distribution.

Python Archives » finnstats

Calculating Z-Scores in Python: A Step-by-Step Guide

Calculating Z-Scores in Python

Conclusion

You may also like...

Leave a Reply Cancel reply

Recent Articles

Articles

Calculating Z-Scores in Python: A Step-by-Step Guide

Calculating Z-Scores in Python

Conclusion

You may also like...

Why Python is an Important and Useful Programming Language

Essential Python Libraries for Statistics

Exponential Smoothing Process Time Series Data in Python

Leave a Reply Cancel reply

Recent Articles

Articles