welchs t test of unequal variance

ML 101: Welch’s T-test of Unequal Variance (Full Code)

While you’re probably much more familiar with the Student’s t-test (independent t-test), there is another t-test that doesn’t get nearly enough credit.

The Welch’s t-test is a test that you need in your toolbox if you want to succeed as a data scientist.

While other blogs make this test seem complicated, we break it down, so it’s simple and easy to implement.

In this post, you’ll learn:

  • What the Welch t-test is
  • Assumptions of Welch’s t-test
  • How To Perform The Welch’s t-test
  • Common Questions Answered at The Bottom

Let’s jump right in.

jumping in

Welch’s t-test

Welch’s t-test is a statistical test used to help analyze two different samples for insights into their original populations.

We usually want to determine if these two samples’ original populations had different or similar means.

Insights into original populations are critical, and comparing the mean of two samples is a good start.

The null hypothesis in Welch’s t-test is that the two means of the two samples are equivalent, shown below.

null hypothesis

While the alternative hypothesis is that these means are not equal, shown below.

alternative hypothesis

While you will need a large enough sample size to calculate the variance, Welch’s t-test statistic does not restrict the size of either sample.


Since this test is parametric, some assumptions must be met before you can perform an accurate t-test.

If these assumptions are not met, you can still perform the test, but the result cannot be trusted.

These assumptions are:

  • Both Are Normally Distributed (Normality Assumption)
    • This can easily be tested with a QQ-Plot or a Shapiro-Wilk Test
  • Numerical Values
    • Sadly, unlike tests of homogeneity and other chi2 tests, we can’t utilize categorical variables.

Note that we do not have the assumption of equal variances.

This is the key difference between welch’s t-test and the student’s t-test.

We pay for it with our degrees of freedom, where it’s much lower, driving up the critical value and making it harder to prove that our population means are equal.


While the formula looks intimidating at first, our T value can be calculated by the following:


Since we know that our test is two-tailed, we’ll need our degrees of freedom to calculate our critical region.

However, if we simplify the standard deviation divided by the sample size and rename it the standard error, we get the following:


With that in mind, our Degrees of Freedom formula becomes much simpler


degrees of freedom formula for welchs t test

Now that we have our T statistic and Degrees of Freedom, we can check our T-table for 95% Critical Value (assuming you’re using 0.05).

If our T value is greater than the critical value we find in our T-table, we reject the null hypothesis that the population means are equal.

If our T value is less, we fail to reject the null hypothesis that these population means are not equal.

Perform Welch’s t-test in Python

import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import shapiro
import warnings

df = pd.read_csv('brooklyn_listings.csv')

# focus on price and sqft
# take some random 100 row sample
df = df[['price','sqft']].dropna().sample(n=100)

# quickly remove outliers
remove = df.copy()

for col in df.columns:
    remove[col+'_Zscore'] = np.abs(stats.zscore(remove[col]))
    remove[col+'_outlier'] = np.where(remove[col+'_Zscore']>3,1,0)
# quickly filter the outliers out
remove = remove[remove['price_outlier']!=1]
remove = remove[remove['sqft_outlier']!=1]

# back to ouriginal dataframe
df = remove[['price','sqft']]

# make our data normal with a yeojohnson transformation
for col in df.columns:
    df[col], lambbda = stats.yeojohnson(((df[col])))
    # test normality of each column

The null hypothesis in the Shapiro-Wilks test is that the distribution is normal.

Since both of our values are greater than .05, we fail to reject the null hypothesis

Meaning both sets of our data are normal

welch t test normality assumption true

With this in mind, we can compare them


def welchsttest(M1, M2):
    # remember, this is welchs, so we do not assume equal variance
    T, p_value = stats.ttest_ind(M1, M2, equal_var = False)
    print(f'T value {T},\n\np-value {round(p_value,5)}\n')
    if p_value < .05:
        print('Reject Null Hypothesis')
        print('Fail To Reject Null')

We see that our p-value is very low, and we reject the null hypothesis.

welch t test result with p-value

We can assume that the means for these populations are not the same.

This makes sense if we remember back to our initial data.

Sqft would never have the same mean as price.

Frequently Asked Questions

While we receive tons of questions here at EML, sadly can’t get to all of them.

Below, we’ve answered some questions that we’ve received in emails.

We hope that this helps you answer any questions that you have.

Why is Welch’s t-test better than the regular t-test?

The assumption of the equality of variances is sometimes impossible to prove. This makes Welch’s t-test a perfect candidate when you’re trying to understand the means of the population.

While both of these are two-sample tests, since we have unequal variances, our degrees of freedom in Welch’s t-test will be lower than in the student’s t-test.

Because of this, our critical value will be higher, making it harder to reject the null hypothesis that our means are unequal.

When should you use Welch’s t-test compared to the independent t-test?

You should use Welch’s t-test whenever your samples have unequal variance.
If your samples have equal variance, you’ll need to utilize the independent t-test to get a more accurate result.
You could also utilize Welch’s t-test when you want a stricter testing criterion.
Since there are fewer degrees of freedom for Welch’s t-test, our critical value will be higher.
This makes it harder to reject the null hypothesis for our sample data.

What is the critical value used for in the Welch test?

Your critical value in the Welch test is the value we use to decide if we are going to reject the null hypothesis or not.

When our calculated T value exceeds our critical value, we reject the null hypothesis that the sample means are equal.

When our T value is not greater than our critical value, we cannot reject the null hypothesis and assume the means are equal.

Is Welch’s t-test two-tailed?

Welch’s t-test is two-tailed. This is because it depends on the normal distribution, which has two tails outward from the mean.

When comparing means using this test statistic, you’re comparing if one mean is significantly higher than the other (right tail) and if one means is significantly lower than the other (left tail).

For example, if you’re using a significance level of 0.05, both of your tails will cover .025 of the total area under the curve.

If the value falls into these regions, we have enough evidence to reject the null hypothesis and assume that the population means are different.

See below for an image:


two tailed normal distribution test

The easiest way to understand if a test is two-tailed is by asking yourself if the effect can be felt in multiple directions or in one.

For example, a mean can be much higher or much lower than another mean.

Is welch’s t-test nonparametric?

Since both of our samples have to be normally distributed, Welch’s t-test is parametric.

While this does mean we won’t be able to apply it in all cases, it’s still a very powerful statistic whenever those assumptions are met.


Other Articles In Our Machine Learning 101 Series

We have many straightforward guides that go over some of the essential parts of machine learning.

Some of those guides include:

Stewart Kaplan