• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Python and R Tips

Learn Data Science with Python and R

  • Home
  • Python
  • Pandas
    • Pandas 101
  • tidyverse
    • tidyverse 101
  • R
  • Linux
  • Conferences
  • Python Books
  • About
    • Privacy Policy
You are here: Home / Machine Learning / Linear Regression in Python / Linear Regression by Matrix Multiplication / Linear Regression Using Matrix Multiplication in Python Using NumPy

Linear Regression Using Matrix Multiplication in Python Using NumPy

March 17, 2020 by cmdlinetips

Linear Regression is one of the commonly used statistical techniques used for understanding linear relationship between two or more variables. It is such a common technique, there are a number of ways one can perform linear regression analysis in Python. In this post we will do linear regression analysis, kind of from scratch, using matrix multiplication with NumPy in Python instead of readily available function in Python.

Let us first load necessary Python packages we will be using to build linear regression using Matrix multiplication in Numpy’s module for linear algebra.

import pandas as pd
import numpy as np
# import matplotlib
import matplotlib.pyplot as plt
# import seaborn
import seaborn as sns
%matplotlib inline

To build linear regression we will use the classic cars data from cmdlinetips.com‘s github page.

data_url = 'https://raw.githubusercontent.com/cmdlinetips/data/master/cars.tsv'
cars = pd.read_csv(data_url, sep="\t")

cars dataset contains distance needed for cars at different speeds to stop from 1920 cars.

print(cars.head(n=3))
   speed  dist
0      4     2
1      4    10
2      7     4

Let us first visualize the relationship between speed and dist variables using a scatter plot.

bplot= sns.scatterplot('dist','speed',data=cars)
bplot.axes.set_title("dist vs speed: Scatter Plot",
                    fontsize=16)
bplot.set_ylabel("Speed (mph)", 
                fontsize=16)
bplot.set_xlabel("Distances taken to stop (feet)", 
                fontsize=16)

We can see a clear linear relationship between the two variables.

Scatter Plot for Linear Regression Python
Scatter Plot for Linear Regression Python

Let us name the two columns with two variable names X and Y, where X is the predictor variable

X = cars.dist.values

and Y is the response variable.

Y = cars.speed.values

Our observed data are pairs of x and y values.

Data for Linear Regression

Data for Linear Regression

With linear regression model, we fit our observed data using the linear model shown below and estimate the parameters of the linear model.
Linear Regression Model
Linear Regression Model

Here beta_0 and beta_1 are intercept and slope of the linear equation. We can combine the predictor variables together as matrix. In our example we have one predictor variable. So we create a matrix with ones as first column and X.

We use NumPy’s vstack to create a 2-d numpy array from two 1d-arrays and create X_mat.

X_mat=np.vstack((np.ones(len(X)), X)).T
X_mat[0:5,]
array([[ 1.,  2.],
       [ 1., 10.],
       [ 1.,  4.],
       [ 1., 22.],
       [ 1., 16.]])

Linear Regression Model Estimates using Matrix Multiplications

With a little bit of linear algebra with the goal to minimize the mean square error of a system of linear equations we can get our parameter estimates in the form of matrix multiplications shown below.

Parameter Estimates of Linear Regression
Parameter Estimates of Linear Regression

We can implement this using NumPy’s linalg module’s matrix inverse function and matrix multiplication function.

beta_hat = np.linalg.inv(X_mat.T.dot(X_mat)).dot(X_mat.T).dot(Y)

The variable beta_hat contains the estimates of the two parameters of the linear model and we computed with matrix multiplication.

print(beta_hat)
[8.28390564 0.16556757]

It is vector containing y-axis intercept and slope of the linear regression model. Let us use the parameters to estimate the values of Y using X values.

# predict using coefficients
yhat = X_m.dot(beta_hat)

We can visualize our estimate of yhat with the scatter plot.

# plot data and predictions
plt.scatter(X, Y)
plt.plot(X, yhat, color='red')

Linear Regression fit with Matrix Multiplication in Python
Linear Regression fit with Matrix Multiplication in Python

We can clearly see that our estimates nicely shows the linear relationship between X and Y. Let us double check our estimates of linear regression model parameters by matrix multiplication using scikit-learn’s LinearRegression model function.

Verifying Linear Regression Model Estimates using Scikit-learn

Let us load scikit-learn’s linear regression module.

from sklearn.linear_model import LinearRegression

We can build linear regression model first initiating the object and then fitting the model with the data.

regression = LinearRegression()
linear_model = regression.fit(X[:,np.newaxis],Y)

We can extract the parameters of the model using “intercept_” and “coef_” function. And we can see that the estimates are exactly the same as we obtained by matrix multiplication method.

print(linear_model.intercept_)
8.283905641787172
print(linear_model.coef_)
[0.16556757]

In summary, we build linear regression model in Python from scratch using Matrix multiplication and verified our results using scikit-learn’s linear regression model. Solving the linear equation systems using matrix multiplication is just one way to do linear regression analysis from scrtach. One can also use a number of matrix decomposition techniques like SVD, Cholesky decomposition and QR decomposition. A good topic for another blog post on linear regression in Python with linear algebra techniques.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related posts:

Default ThumbnailIntroduction to Linear Regression in Python statsmodels PythonLinear Regression Analysis with statsmodels in Python Default ThumbnailIntroduction to Linear Regression in R Default Thumbnail9 Basic Linear Algebra Operations with NumPy

Filed Under: Linear Regression by Matrix Multiplication, Linear Regression in Python, Python Tips, Scikitlearn Linear Regression Tagged With: Linear Regression, Linear Regression by Matrix Multiplication

Primary Sidebar

Subscribe to Python and R Tips and Learn Data Science

Learn Pandas in Python and Tidyverse in R

Tags

Altair Basic NumPy Book Review Data Science Data Science Books Data Science Resources Data Science Roundup Data Visualization Dimensionality Reduction Dropbox Dropbox Free Space Dropbox Tips Emacs Emacs Tips ggplot2 Linux Commands Linux Tips Mac Os X Tips Maximum Likelihood Estimation in R MLE in R NumPy Pandas Pandas 101 Pandas Dataframe Pandas Data Frame pandas groupby() Pandas select columns Pandas select_dtypes Python Python 3 Python Boxplot Python Tips R rstats R Tips Seaborn Seaborn Boxplot Seaborn Catplot Shell Scripting Sparse Matrix in Python tidy evaluation tidyverse tidyverse 101 Vim Vim Tips

RSS RSS

  • How to convert row names to a column in Pandas
  • How to resize an image with PyTorch
  • Fashion-MNIST data from PyTorch
  • Pandas case_when() with multiple examples
  • An Introduction to Statistical Learning: with Applications in Python Is Here
  • 10 Tips to customize ggplot2 title text
  • 8 Plot types with Matplotlib in Python
  • PCA on S&P 500 Stock Return Data
  • Linear Regression with Matrix Decomposition Methods
  • Numpy’s random choice() function

Copyright © 2025 · Lifestyle Pro on Genesis Framework · WordPress · Log in

Go to mobile version