Home »
Python »
Python Programs
How to Perform Hypothesis Testing using SciPy?
Python | Hypothesis Testing using SciPy: In this tutorial, we will learn how to perform various hypothesis testing using SciPy. Learn with the help of examples.
By Pranit Sharma Last updated : June 07, 2023
Hypothesis Testing
Hypothesis testing is a statistical method that is used to make conclusions about a population based on sample data. It involves evaluating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (H1), and using statistical techniques to evaluate the evidence against the null hypothesis.
Hypothesis Testing using SciPy
There are different types of hypothesis tests that can be performed using their corresponding SciPy functions. One-sample t-test, Independent samples t-test, Chi-square test for independence, One-way ANOVA, Kruskal-Wallis H-test, Mann-Whitney U-test, Wilcoxon signed-rank test, etc are some of the hypothesis tests.
Here, we will mainly focus on one-sample t-test, independent samples t-test, and chi-square test for independence.
Steps for Hypothesis Testing using SciPy
It is important to note that no matter which hypothesis test is applied to the data, there is a common series of steps that has to be followed in each test. These steps are as follows:
- Define the null and alternative hypotheses. The null hypothesis (H0) represents the data we want to test and the alternative hypothesis (H1) is the value we expect to support.
- Prepare the data.
- Choose an appropriate test statistic.
- Compute the test statistic and p-value. The p-value represents the probability of obtaining the observed data (or data more extreme) under the null hypothesis.
- Determine the significance level (α) for our test. Commonly used significance levels are 0.05 (5%) or 0.01 (1%).
- Compare the p-value to the significance level.
1. Hypothesis Testing Using SciPy: One-sample t-test
This test is commonly used to determine if the mean of a sample significantly differs from a hypothesized population mean. We need to define a hypothesized population mean for this test and we can use ttest_1samp() which will return the test statistic value and the p-value. The p-value can then be compared to a specified significance value.
Note that if the p-value is less than the chosen significance level, it means that the observed difference between the sample mean and the hypothesized mean is statistically significant, and we can reject the null hypothesis in favor of the alternative hypothesis.
If the p-value is greater than or equal to the significance level, there is insufficient evidence to reject the null hypothesis.
SciPy program to perform one-sample t-test
# Import scipy stats
import scipy.stats as stats
# Import numpy
import numpy as np
# Preparing data
data = np.array([3, 5, 6, 2, 1, 8, 7, 4, 5, 9])
# Defining a Hypothesized mean
mu = 5
# Performing one-sample t-test
t_statistic, p_value = stats.ttest_1samp(data, mu)
# Display results
print("T-statistic:", t_statistic, "\n")
print("P-value:", p_value, "\n")
# Comparing p value with significance value
alpha = 0.5
if p_value < alpha:
print("Reject the null hypothesis because mean difference is statistically significant")
else:
print("Not enough evidence, Failed to reject the null hypothesis")
Output
T-statistic: 0.0
P-value: 1.0
Not enough evidence, Failed to reject the null hypothesis
2. Hypothesis Testing Using SciPy: Independent samples t-test
This test is commonly used when comparing the means of two independent samples or when comparing the mean of a sample to a known population mean.
To perform this test, we can use the ttest_ind() function from scipy.stats module which is used to calculate the t-statistic and p-value for a one-sample t-test. The p-value is then compared to the significance level (α).
SciPy program to perform independent samples t-test
# Import scipy stats module
import scipy.stats as stats
# Preparing data
data1 = [1, 2, 3, 4, 5]
data2 = [6, 7, 8, 9, 10]
# Performing independent samples t-test
t_, p_value = stats.ttest_ind(data1, data2)
# Display test result and p_value
print("t_statistic:\n", t_, "\n")
print("p_value:\n", p_value, "\n")
# Comparing p-value to significance level
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis because mean difference is statistically significant")
else:
print("Not enough evidence, Failed to reject the null hypothesis")
Output
t_statistic:
-5.0
p_value:
0.001052825793366539
Reject the null hypothesis because mean difference is statistically significant
3. Hypothesis Testing Using SciPy: Chi-square test for independence
This test is commonly used to determine if there is a significant association between two categorical variables. We need to define categorical values for this test.
To perform the Chi-square test for independence, we can use the chi2_contingency() method from the scipy.stats module. It returns the Chi-square statistic, the associated p-value, the degrees of freedom (df), and the expected frequencies based on the observed data.
SciPy program to perform chi-square test for independence
# Import scipy stats
import scipy.stats as stats
# Import numpy
import numpy as np
# Preparing data
observed_data = np.array([[10, 15, 5], [20, 25, 15]])
# Performing Chi-square test for independence
chi2_stat, p_value, dof, expected = stats.chi2_contingency(observed_data)
# Display results
print("Chi-square statistic:", chi2_stat, "\n")
print("P-value:", p_value, "\n")
print("Degrees of freedom:", dof, "\n")
print("Expected frequencies:\n", expected)
Output
Chi-square statistic: 0.9374999999999999
P-value: 0.6257840096045911
Degrees of freedom: 2
Expected frequencies:
[[10. 13.33333333 6.66666667]
[20. 26.66666667 13.33333333]]
Python SciPy Programs »