• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Python / Computing Correlation with Numpy corrcoef()

Computing Correlation with Numpy corrcoef()

December 25, 2022 by cmdlinetips

In this post, we will learn how to use Numpy’s corrcoef() function to compute correlation between two datasets stored in a list or arrays. Numpy’s corrcoef function calculates pearson correlation coefficient, which is a measure of how two variables are related.

The resulting correlation coefficient can range from 1 to -1. A correlation coefficient of 1 indicates a strong positive relationship (meaning that as one variable increases, the other also increases), while a correlation coefficient of -1 indicates a strong negative relationship (meaning that as one variable increases, the other decreases). A correlation coefficient of 0 indicates no relationship between the two variables.

How to compute correlation between two variables in Numpy

To use the corrcoef() function, we pass in two sets of data as arguments. The function will return a symmetric matrix of correlation coefficients, with the diagonal elements being 1 (since each variable is perfectly correlated with itself). The off-diagonal elements represent the correlation between the two variables. The correlation coefficients on the upper triangular elements will be the same lower triangular elements.

Let us consider a simple example to compute correlation using corrcoef(). We have two lists of numbers.

x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]

Let compute correlation coefficients with corrcoef() function.

correlation = np.corrcoef(x, y)
print(correlation)
[[ 1. -1.]
 [-1.  1.]]

As you can see, the corrcoef() function returns a matrix of correlation coefficients, with each element representing the correlation between the two variables.

The first element in the matrix, correlation[0][0], is the correlation between x and x, which is always 1. The second element, correlation[0][1], is the correlation between x and y, which in this case is -1 because the two sets of data are negatively correlated.

Computing correlation on 2D array with Numpy corrcoef

Numpy’s corrcoef can compute correlation on a matrix or 2d Numpy array. Here we just need to give the Numpy 2d array as input argument and we get correlation matrix as output.

Let us simulate some data in 2d Numpy array. Here is the 2D array containing random integers.

x = np.random.randint(20, size=(3,5))
x

array([[16, 17,  7,  9, 16],
       [ 3,  7,  6,  9, 10],
       [15, 12,  7, 11,  5]])

Here we compute correlation coefficient matrix for all pairs of rows in the 2D array using Numpy corrcoef.

np.corrcoef(x)

Since we have 3 rows, we get correlation matrix of dim 3×3 with self correlation along the diagonal.

array([[ 1.        , -0.0984374 ,  0.29654013],
       [-0.0984374 ,  1.        , -0.6846532 ],
       [ 0.29654013, -0.6846532 ,  1.        ]])

Computing column wise correlation with Numpy corrcoef

Another argument of interest to Numpy’s corrcoef is rowvar. By default rowvar is set True (default). Then each row represents a variable, with observations in the columns. However, when rowvar is False, each column represents a variable, while the rows contain observations.

np.corrcoef(x, rowvar=False)

Therefore Numpy’s correcoef fucntion will compute correlation for each column. For the example data we get 5×5 correlation matrix as there are 5 columns.

array([[ 1.        ,  0.89851257,  0.99760861,  0.43894779,  0.12131025],
       [ 0.89851257,  1.        ,  0.8660254 ,  0.        ,  0.54470478],
       [ 0.99760861,  0.8660254 ,  1.        ,  0.5       ,  0.05241424],
       [ 0.43894779,  0.        ,  0.5       ,  1.        , -0.83862787],
       [ 0.12131025,  0.54470478,  0.05241424, -0.83862787,  1.        ]])

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailPearson and Spearman Correlation in Python Correlation Heatmap: Lower Triangle with SeabornHow To Make Lower Triangle Heatmap with Correlation Matrix in Python? Default ThumbnailHow to Randomly permute Numpy Array Default Thumbnail9 Basic Linear Algebra Operations with NumPy

Filed Under: Numpy Correlation, Numpy Tips, Python, Python Tips Tagged With: Numpy corrcoef

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version