Calculating Z-Scores in Python: A Step-by-Step Guide

Calculating Z-Scores in Python, Z-scores are a fundamental concept in statistics, providing a way to measure how many standard deviations away a value is from the mean.

Calculating Z-Scores in Python

In this article, we’ll explore how to calculate z-scores in Python using various libraries and data structures.

Using SciPy’s zscore Function

The zscore function in SciPy’s stats module provides a convenient way to calculate z-scores for one-dimensional arrays or multi-dimensional arrays. The function takes the following arguments:

  • a: an array-like object containing the data
  • axis: the axis along which to calculate the z-scores (default is 0)
  • ddof: degrees of freedom correction in the calculation of the standard deviation (default is 0)
  • nan_policy: how to handle NaN values (default is propagate, which returns NaN)

Example 1: Calculating Z-Scores for a One-Dimensional Numpy Array

Let’s start with a simple example using a one-dimensional numpy array.

import numpy as np
import scipy.stats as stats

data = np.array([6, 7, 7, 12, 13, 13, 15, 16, 19, 22])
z_scores = stats.zscore(data)
print(z_scores)

This will output:

[-1.394 -1.195 -1.195 -0.199  0.    0.    0.398  0.598  1.195  1.793]

Each z-score tells us how many standard deviations away an individual value is from the mean.

Example 2: Calculating Z-Scores for a Multi-Dimensional Numpy Array

What if we have a multi-dimensional array? We can use the axis parameter to specify which axis to calculate the z-scores for. For example:

Correlation By Group in R » Data Science Tutorials

data = np.array([[5, 6, 7, 7, 8],
                 [8, 8, 8, 9, 9],
                 [2, 2, 4, 4, 5]])
z_scores = stats.zscore(data, axis=1)
print(z_scores)

This will output:

[[-1.569 -0.588  0.392  0.392  1.373]
 [-0.816 -0.816 -0.816  1.225  1.225]
 [-1.167 -1.167  0.5   0.5   1.333]]

Each z-score is calculated relative to its own array.

Example 3: Calculating Z-Scores for a Pandas DataFrame

Finally, let’s use the apply function to calculate z-scores for individual values in a Pandas DataFrame.

import pandas as pd
import numpy as np
import scipy.stats as stats

data = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C'])
z_scores = data.apply(stats.zscore)
print(z_scores)

This will output:

          A         B         C
0   0.659380 -0.802955  0.836080
1 -0.659380 -0.802955 -0.139347
2   0.989071 -0.917663 -0.487713
3 -1.648451 -1.491202 -1.950852
4   0.659380 -0.802955 -0.487713

Each z-score is calculated relative to its own column.

Conclusion

Calculating z-scores in Python is a straightforward process using SciPy’s zscore function or the apply function in Pandas DataFrames.

By following these examples, you can easily calculate z-scores for your own data and gain valuable insights into your data distribution.

Python Archives » finnstats

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

ten + 15 =

Ads Blocker Image Powered by Code Help Pro

Quality articles need supporters. Will you be one?

You currently have an Ad Blocker on.

Please support FINNSTATS.COM by disabling these ads blocker.

Powered By
Best Wordpress Adblock Detecting Plugin | CHP Adblock