Linear Regression – Understanding the details

Hi there, welcome. This post deals with exploring regression – one part at a time. To catch up with the details please see the link at:
https://codingmachinelearning.wordpress.com/2016/12/03/understanding-regression/

In the last post, we saw an English version of linear regression devoid of all mathematics, optimizations blah blah. This post, aims at understanding the mathematical nuances and the beauty of simple gradient descent optimization for obtaining the line of best fit. Just to recap, we have a line of best fit given by linear regression. We claim, that this line captures the trend and pattern in the data. Here is what we are going to do:

  • Obtain and understand the data set
  • Loading the data set
  • Steps in coding the classifier
  • Understanding the Bias term in linear regression
  • How to define error

Obtain and understand the data set

We will use a simple house price prediction data set – with 2 columns. First column is the area of the house and the second column is the price of the house. In our case, the second column, is the column of interest (also our labels). That means, once I have a trained linear regression model available, given area of a house, I should be able to predict price for it.

Problem:

Given area of the house, we should be able to obtain a prediction price for the same.

Loading the data set

To simplify the loading headache, we will directly use pandas package to help us load the data set (it is plainly a single line code). Pandas will actually load the dataset into memory!! You can refer about pandas here

Steps in coding the classifier

We have the following steps in coding the linear regression classifier: Let X denote the area of the house, and Y denote the house price (labels)

  • We have X and Y loaded an ready
  • Add a Bias column to X, i.e Add a column of 1’s to X. So our new data matrix will be [colum_of_ones X]
  • Define prediction, i.e Y = MX + C, where M is slope and C is intercept (both a floating point numbers!!)
  • Define error/cost. We will be using root mean squared error as our cost functions
  • Use gradient descent, to find the best values for M and C such that they minimize the error

Don’t worry. We will go through each of these steps in detail. I will also release final version of this code for guidance and checking purposes.

Understanding the Bias term in linear regression

Once we know how to load data, the next step is to add the bias component. Many have this question, why have a bias term? Well I am here to give you an convincing answer. What is our learning objective in linear regression? We need to get the best value for M and C, such that Y = MX + C becomes the line of best fit. Now, see the equation carefully.

Y = M*X + 1*C: This explains the bias term, 1, for each instance.

Say if I don’t have a bias term, and it is zero. Then my equation will be of the form, Y=MX. What is the problem with this? This say that, when X = 0 the Y has to be 0, which imply that the line of best fit has to pass through the origin each time. Got it? This need not always be true, the data need not always be spread around the origin, can be biased. Hence we add the bias term to the network, which can actually help in moving the line of best fit up and down – not constraining it to pass through the origin.

Defining Error in Linear regression

We define error in a very intuitive manner – the simplicity and the ease of it will amaze you. We will:

  • Use the difference between the observed (actual) and the predicted value
  • Sum the squares of the difference ( squares is a must, otherwise errors may cancel out due to different signs)
  • Normalize (divide) by the total number of examples

So the Error Term will be: ∑ Square (| H(x,c) – Y |) / M where

H(x,c) = MX + C and Y is the observed value from our dataset. M is the total number of instances/examples. This is how we define error. Very intuitive right?

Now how do we start the regression? For that you need to wait till the next post that will get you started with random initialization of values for M and C and iteratively converge to the best solution!

Till then, Bye.

 

 

Leave a comment