Machine Learning | Regression Models

January 3, 2024

Regression is a statistical approach used to model the relationship between a dependent variable and a set of independent variables. It is a supervised machine learning technique that makes predictions based on a set of features or independent variables in a dataset. Linear coefficients of each parameter and the y-intercept are approximated through training the models with a set of training data. The final model is chosen based on which model minimizes the sum of squared residuals.


Model Selection Based on Performance

MSE=1ni=1n(yif(xi))2MSE = \dfrac 1n \displaystyle\sum_{i = 1}^n (y_{i} - f(x_{i}))^2


Measuring Regression Model Accuracy

R2=TSSRSSTSS=1RSSTSSR^2 = \dfrac {TSS - RSS}{TSS} = 1 - \dfrac {{RSS}}{TSS}

RSS=i=1n(yif(xi))2RSS = \displaystyle\sum_{i = 1}^n (y_{i} - f(x_{i}))^2

TSS=i=1n(yiyˉ)2TSS = \displaystyle\sum_{i = 1}^n (y_{i} - \bar{y})^2

RSE=1np1RSSRSE = \sqrt{\dfrac {1}{n - p - 1} * RSS}


Ordinary Least Squares Linear Regression

y=B0+Bpxp+ϵy = B_0 + B_{p}x_{p} + \epsilon


Ridge Regression

i=1n(yif(xi))2+λj=1pβj2\displaystyle\sum_{i = 1}^n (y_{i} - f(x_{i}))^2 + \lambda \displaystyle\sum_{j = 1}^p \beta_j^2


Lasso Regression

i=1n(yif(xi))2+λj=1pβj\displaystyle\sum_{i = 1}^n (y_{i} - f(x_{i}))^2 + \lambda \displaystyle\sum_{j = 1}^p |\beta_j|