How to calculate Point-Biserial Correlation in Python?

A guide to Point-Biserial Correlation in Python: In this tutorial, we will learn about the Point-Biserial Correlation and how to calculate Point-Biserial Correlation in Python? By Shivang Yadav Last updated : September 13, 2023

Recently we have related about Partial Correlation where both values are variables. Now, we will be learning about relationships where one of the values is binary, point-biserial correlation.

Point-Biserial Correlation

Point-biserial correlation, commonly denoted as rpb is a statistical measure that defines the strength and direction of the relationship between a binary variable and a continuous variable. It quantifies the extent to which a continuous variable differs between two groups defined by the binary variable.

Point-Biserial Correlation has the following components,

  1. Binary (dichotomous) Variable: This is a variable that has only two possible categories or outcomes.
    Examples: gender (male or female), presence or absence of a specific condition, yes or no responses, etc.
  2. Continuous Variable: A numerical variable that can take on a wide range of values including decimal points. It represents the measurement of some characteristic or attribute.
    Examples: age, height, test scores, etc.
  3. Point-biserial Correlation: Point-biserial correlation measures the degree of association or relationship between the binary variable and the continuous variable.
  4. Direction of Relation: Point-biserial correlation can be positive or negative.
    Positive value indicates that higher values of the continuous variable are associated with one category of the binary variable,
    Negative value suggests that higher values of the continuous variable are associated with the other category of the binary variable.
  5. Magnitude of Relation: The absolute value of the point-biserial correlation coefficient indicates the strength of the association. A value closer to 1 indicates a stronger association, while a value closer to 0 suggests a weaker association.

Point-biserial correlation is particularly useful when you want to determine whether a continuous variable significantly differs between two groups defined by a binary variable, such as investigating whether there is a difference in test scores between male and female students.

Calculation of point-Biserial Correlation in Python

Python programming language provides functions to its users to perform all statistical operations. Point-Biserial Correlation can also be calculated using Python's built-in functions.

Python's scipy.stats library provides a pointbiserialr() function that returns a set of values that define the point-Biserial Correlation between two values.

Syntax

scipy.stats.pointbiserialr(x, y)

Here,

  • x is binary variable set
  • y is continuous variable set

Python program to compute the Point-Biserial Correlation

import scipy.stats as st

result = [0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0]
hours = [12, 14, 17, 17, 11, 22, 23, 11, 19, 8, 12]

pointBiserialCorr = st.pointbiserialr(result, hours)

print(f"Point Biserial Correlation: {pointBiserialCorr}")

Output:

Point Biserial Correlation: PointbiserialrResult(correlation=0.21816345457887468, pvalue=0.519284292877361)

Python SciPy Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.