Linear Regression in Machine Learning
Linear Regression in Machine Learning
16 April 2021
Machine learning is one of the important applications of Artificial Intelligence/AI that provides systems the capability to automatically learn new things and improve from their own experience without being explicitly programmed i.e. self-made. Machine learning usually focuses on the development of such computer programs that can access the data and use it to learn for themselves for predicting a certain output. There are different types of regression in machine learning techniques like Linear Regression, Logistic Regression, Ridge Regression, Lasso Regression, Polynomial Regression, Bayesian Linear Regression. Linear regression is the most important and basic type of regression in machine learning. Linear Regression is a supervised machine learning algorithm in which the predicted output is in continuous form and has a constant slope.
Since the regression task belongs to one of the most common machine learning problems in supervised learning, Every Machine Learning Engineer should have a thorough understanding of how it works. Machine Learning is that field in which you will see new advancements taking place every day. Linear regression is the earliest and most used algorithms in Machine Learning. Linear regression has been used by people around since 1911. It is a great way to involve yourself deeply in traditional statistics because it has been analyzed and highly used by statisticians. It is used to predict the values within a continuous range like sales, order instead of trying to classify them into categories like male, female. Linear regression does attempt to measure the correlation between our provided input data and the response variable. For example, you can use linear regression to check either there is a correlation between height and weight, and if there is, then how much –both things to understand the complete relationship between the two, as well as predict the weight if you know the height.
There are some basic assumptions of Linear Regression for that one must test our data in such a way that we can correctly apply Linear Regression. If any of these assumptions are being violated then we may get inappropriate results. Mainly there are two types, Simple regression, and multiple regression. So basically, the purpose of Linear Regression is Analysis and Prediction Analysis, Linear regression helps you to understand and quantify the entire relationship between your numerical/countable data. In this case, quantifying the numeric independent variables and correlating them to a numerically sequential continuous response (dependent) variable.
Prediction: Once you have built your model, you can try to predict an output or response based upon a given set of inputs or independent variables. For better understanding, let’s suppose given the data we predict the value for something, for example, given the data of area and prices of 50 shops, we predict what will be the price of our shop. In order to predict one needs a hypothesis or the function to which we will provide the data and it will give us the expected output. That hypothesis should fit through the data to give us the expected accurate output.
Python Libraries: Statsmodels library vs sci-kit-learn library: As of this these appear to be the two most used and popular libraries for modeling linear regression. Not generalize but I usually prefer statsmodels API because, once you have built the model, it provides convenient access to a number of different attributes that the other sci-kit learn does not, and also it has a helpful model.summary() method. This method calculates many helpful things such as r-squared and your f-statistic. The Linear Regression Terms/Entities are Independent variable, Dependent variable, Residuals, Coefficients/weights.
Independent variable: Also known as explanatory, exogenous, input, predictor Your observations and input data, and what linear regression is attempting to correlate with your dependent/output data.
Dependent variable: Also known as a response, endogenous, output. It is the value you are trying to measure correlation with your independent variables and the value you are trying to predict.
Residuals: Sometimes the residuals are called ass “error.” This is the difference between the predicted and the actual response. The more relevant and independent variables you’ll add, the more your difference between those two decreases, indicating a model with increased accuracy or predictive power. It is the distance or the difference between our plot points and the linear regression prediction line.
Weights or Coefficients: It represents or measures the correlation between your provided input or explanatory and expected output or response variable.