Calculating Chi-Square Using Scipy in Python

This code calculates the expected frequencies and the chi-square statistic for a dataset using Python.

First, I need to import the necessary libraries: scipy and numpy.

import numpy as np
import scipy.stats as stats

The approach differs depending on whether I'm calculating the chi-square for a contingency table as a test of independence or comparing a sample to a theoretical distribution.

Chi-Square Test for Independence (Contingency Table)

For a contingency table, the chi-square calculation follows the computation of the expected frequencies.

In this example, I store the observed data in a 2x2 contingency table using an array.

observed = np.array([[50, 30], [20, 100]])

The `observed` matrix represents the contingency table.

I then calculate the expected frequencies and the chi-square statistic using the chi2_contingency() function from scipy.stats.

chi2, p_value, dof, expected_frequencies = stats.chi2_contingency(observed)

This function returns:

  • `chi2`: the chi-square statistic.
  • `p_value`: the associated p-value.
  • `dof`: the degrees of freedom.
  • `expected_frequencies`: the matrix of expected frequencies.

Finally, I display the results.

print("Expected frequencies:")
print(expected_frequencies)
print("Chi-square value:", chi2)
print("P-value:", p_value)
print("Degrees of freedom:", dof)

Here's the output:

Expected frequencies: [[28. 52.] [42. 78.]]
Chi-square value: 42.33058608058608
P-value: 7.707766001215446e-11
Degrees of freedom: 1

Chi-Square Goodness of Fit (Single Sample)

When comparing a single sample against a theoretical distribution, I first calculate the expected frequencies and then compute the chi-square value.

In this example, I store the observed frequencies in a list.

observed = [20, 30, 50]

Next, I store the theoretical probabilities for each category in another list.

theoretical_probabilities = [0.25, 0.25, 0.5]

I calculate the total number of observations and store it in the variable N.

N = sum(observed)

The expected frequencies are determined by multiplying the theoretical probabilities by the total number of observations.

expected_frequencies = [p * N for p in theoretical_probabilities]

I then use the stats.chisquare() function to compute the chi-square statistic.

chi2, p_value = stats.chisquare(f_obs=observed, f_exp=expected_frequencies)

This function computes the chi-square statistic and the p-value, comparing the observed and expected frequencies.

Finally, I print the results.

print("Expected frequencies:", expected_frequencies)
print("Chi-square value:", chi2)
print("P-value:", p_value)

Here's the output:

Expected frequencies: [25.0, 25.0, 50.0]
Chi-square value: 2.0
P-value: 0.36787944117144245

In both scenarios, computing the expected frequencies and chi-square statistic helps determine if the differences between observed and expected data are statistically significant.

And so on.

 
 

Please feel free to point out any errors or typos, or share suggestions to improve these notes. English isn't my first language, so if you notice any mistakes, let me know, and I'll be sure to fix them.

FacebookTwitterLinkedinLinkedin
knowledge base

Python

  1. The Python Language
  2. How to Install Python on Your PC
  3. How to Write a Program in Python
  4. How to Use Python in Interactive Mode
  5. Variables
  6. Numbers
  7. Logical Operators
  8. Iterative Structures (or Loops)
  9. Conditional Structures
  10. Exceptions
  11. Files in Python
  12. Classes
  13. Modules

Miscellaneous

Source