Simple Linear Regression
The difference between simple linear regression and multiple linear regression is that simple linear regression has only one independent variable whereas multiple linear regression has more than one independent variable. In this post, we will look at how the simple linear regression works.
In linear regression, the dependent variable (also called target) is continuous, and independent variables (also called predictor) can be continuous or discrete. The simple linear regression model is represented by the following equations:
where $latex \theta_0$ is the intercept, $latex \theta_1$ is the slope of the line, and $latex \epsilon_i$ is the error term. The regression function f(x) is used to predict the value of target variable $latex y_i$
based on the predictor variable x. $latex \epsilon_i$ indicates a difference between the estimated value f(x) and the actual value $latex y_i$. The regression coefficients, $latex \theta_0$ and $latex \theta_1$, are parameters in the regression model. This model finds the best fit line by tweaking these parameters.
Python Code
First of all, let’s generate random data that shows a linear relationship between an input x and an output y as follows:
%matplotlib inline
import numpy as np
from pylab import *
x = np.random.normal(3.0, 1.0, 20)
y = x + np.random.normal(2.0, 0.4, 20)
scatter(x, y)
The following code shows the results of linear regression by using scipy.stats.linregress python library:
import matplotlib.pyplot as plt
from scipy import stats
def predict(x):
return slope * x + intercept
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
fitLine = predict(x)
plt.scatter(x, y)
plt.plot(x, fitLine, c = 'r')
plt.show()
In this case, $latex \theta_0$ (intercept) is 1.9813276901908634. $latex \theta_1$ (slope) is 0.95614427225069154. Not surprisingly, the R-squared value shows a really good fit:
r_value ** 2
0.89459544947919656
The following plot shows the regression line with the intercept and the slope :