# Calculating Z-Scores in Python: A Step-by-Step Guide

Calculating Z-Scores in Python, Z-scores are a fundamental concept in statistics, providing a way to measure how many standard deviations away a value is from the mean.

# Calculating Z-Scores in Python

In this article, we’ll explore how to calculate z-scores in Python using various libraries and data structures.

Using SciPy’s `zscore` Function

The `zscore` function in SciPy’s `stats` module provides a convenient way to calculate z-scores for one-dimensional arrays or multi-dimensional arrays. The function takes the following arguments:

• `a`: an array-like object containing the data
• `axis`: the axis along which to calculate the z-scores (default is 0)
• `ddof`: degrees of freedom correction in the calculation of the standard deviation (default is 0)
• `nan_policy`: how to handle NaN values (default is `propagate`, which returns NaN)

Example 1: Calculating Z-Scores for a One-Dimensional Numpy Array

``````import numpy as np
import scipy.stats as stats

data = np.array([6, 7, 7, 12, 13, 13, 15, 16, 19, 22])
z_scores = stats.zscore(data)
print(z_scores)``````

This will output:

``[-1.394 -1.195 -1.195 -0.199  0.    0.    0.398  0.598  1.195  1.793]``

Each z-score tells us how many standard deviations away an individual value is from the mean.

Example 2: Calculating Z-Scores for a Multi-Dimensional Numpy Array

What if we have a multi-dimensional array? We can use the `axis` parameter to specify which axis to calculate the z-scores for. For example:

Correlation By Group in R » Data Science Tutorials

``````data = np.array([[5, 6, 7, 7, 8],
[8, 8, 8, 9, 9],
[2, 2, 4, 4, 5]])
z_scores = stats.zscore(data, axis=1)
print(z_scores)``````

This will output:

``````[[-1.569 -0.588  0.392  0.392  1.373]
[-0.816 -0.816 -0.816  1.225  1.225]
[-1.167 -1.167  0.5   0.5   1.333]]``````

Each z-score is calculated relative to its own array.

Example 3: Calculating Z-Scores for a Pandas DataFrame

Finally, let’s use the `apply` function to calculate z-scores for individual values in a Pandas DataFrame.

``````import pandas as pd
import numpy as np
import scipy.stats as stats

data = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C'])
z_scores = data.apply(stats.zscore)
print(z_scores)``````

This will output:

``````          A         B         C
0   0.659380 -0.802955  0.836080
1 -0.659380 -0.802955 -0.139347
2   0.989071 -0.917663 -0.487713
3 -1.648451 -1.491202 -1.950852
4   0.659380 -0.802955 -0.487713``````

Each z-score is calculated relative to its own column.

## Conclusion

Calculating z-scores in Python is a straightforward process using SciPy’s `zscore` function or the `apply` function in Pandas DataFrames.

By following these examples, you can easily calculate z-scores for your own data and gain valuable insights into your data distribution.

Python Archives » finnstats