Estimating the Average Treatment Effect (ATE) with DoWhy

by finnstats

Estimating the Average Treatment Effect , Imagine a government launches a job training program aimed at helping unemployed individuals earn higher wages. After a year, participants appear to be earning more money than non-participants.

But here’s the critical question:

Did the training program actually cause the increase in earnings, or were participants already different from those who didn’t enroll?

This is the type of question causal inference is designed to answer. Rather than identifying patterns or correlations, causal inference helps us understand what would have happened if the treatment had never occurred.

One of the most important measures in causal analysis is the Average Treatment Effect (ATE). In this tutorial, we’ll use Python’s DoWhy library and the famous Lalonde dataset to estimate the average impact of a job training program on future earnings.

Understanding the Average Treatment Effect

The Average Treatment Effect represents the average change in an outcome that can be attributed to a treatment or intervention.

In simple terms, it answers:

How much difference did the treatment make across the entire population?

For the Lalonde study, the treatment is participation in a job training program, while the outcome is income earned after the program.

If the ATE is positive, the training appears to improve earnings. If it’s negative, participation may have reduced earnings. An ATE close to zero suggests little or no measurable impact.

Loading a Real-World Dataset

Rather than working with a synthetic example, we’ll use the Lalonde dataset, one of the most widely referenced datasets in causal inference research.

import pandas as pd
from dowhy import CausalModel
import dowhy.datasets

data = dowhy.datasets.lalonde_dataset()
df = data["df"]

print(df.head())

The dataset contains information about individuals who either participated in the training program or did not.

Key variables include:

treatment – Indicates program participation
re78 – Earnings after the program
re74 and re75 – Earnings before the program
Demographic characteristics such as age, education, race, and marital status

These additional variables are important because they may influence both treatment assignment and future earnings.

Why Correlation Isn’t Enough

Suppose we simply compare average earnings between participants and non-participants.

The result might suggest that the program worked.

However, participants could differ from non-participants in several ways:

Younger individuals may be more likely to enroll.
People with lower previous earnings may seek additional training.
Education levels may influence both participation and future income.

Because of these differences, a direct comparison can produce biased conclusions.

This is where causal modeling becomes valuable.

Building the Causal Model

Before estimating any effect, we must explicitly describe our assumptions about the data-generating process.

model = CausalModel(
    data=df,
    treatment="treatment",
    outcome="re78",
    common_causes=[
        "age",
        "educ",
        "black",
        "hisp",
        "married",
        "nodegr",
        "re74",
        "re75"
    ]
)

Here, we’re telling DoWhy that these variables may influence both participation in the training program and later earnings.

By accounting for these confounding factors, we move closer to estimating a genuine causal effect rather than a misleading association.

Verifying That the Effect Can Be Identified

A crucial step in causal inference is determining whether the effect can actually be estimated from the available data.

DoWhy performs this automatically.

identified_estimand = model.identify_effect()

print(identified_estimand)

During this stage, the library examines the causal assumptions and determines whether a valid adjustment strategy exists.

In this example, DoWhy typically identifies a backdoor adjustment formula, meaning that controlling for the specified confounders is sufficient to estimate the Average Treatment Effect.

This process removes much of the mathematical complexity traditionally associated with causal inference.

Calculating the Average Treatment Effect

Once the effect is identified, estimation becomes straightforward.

estimate = model.estimate_effect(
    identified_estimand,
    method_name="backdoor.linear_regression"
)

print(estimate.value)

This approach uses linear regression while adjusting for all specified confounding variables.

The resulting value represents the estimated Average Treatment Effect.

Suppose the output is:

1675.3

This would indicate that participation in the training program increased annual earnings by approximately $1,675 on average.

Importantly, this figure reflects the average impact across all individuals in the study. It does not imply that every participant earned exactly $1,675 more.

Some may have benefited substantially, while others experienced little change.

Exploring an Alternative: Propensity Score Matching

Causal inference rarely relies on a single estimation technique.

One popular alternative is Propensity Score Matching (PSM), which attempts to create comparable treatment and control groups.

estimate_psm = model.estimate_effect(
    identified_estimand,
    method_name="backdoor.propensity_score_matching"
)

print(estimate_psm.value)

Rather than fitting a regression model, PSM first estimates each individual’s likelihood of receiving treatment based on observed characteristics.

Individuals with similar probabilities are then matched before outcomes are compared.

This approach often produces estimates that differ slightly from regression-based methods, which is expected because each method relies on different assumptions and modeling strategies.

What Makes the Lalonde Dataset So Important?

The Lalonde dataset has become a benchmark for evaluating causal inference techniques because it closely resembles the challenges analysts face in real-world settings.

Treatment assignment is not purely random, multiple confounding factors are present, and the true impact of the intervention is not immediately obvious.

For these reasons, researchers frequently use the dataset to compare matching methods, regression approaches, propensity score techniques, and modern machine learning-based causal estimators.

Final Thoughts

The Average Treatment Effect is one of the foundational concepts in causal inference because it transforms a vague question—”Did the program work?”—into a measurable quantity.

Using DoWhy, the workflow becomes intuitive:

Define the causal assumptions.
Identify whether the effect is estimable.
Choose an estimation method.
Interpret the resulting causal effect.

The real value lies not in running a few lines of code, but in understanding the assumptions behind the estimate. When those assumptions are reasonable, the ATE provides a powerful way to quantify how interventions, policies, treatments, or business decisions influence outcomes in the real world.

As causal inference continues to gain adoption across data science, economics, healthcare, and machine learning, mastering concepts like the Average Treatment Effect is becoming an increasingly valuable skill for modern analysts.

You may also like...

Leave a Reply Cancel reply