The Math behind Linear Regression

João Augusto Perin
4 min readSep 21, 2021

Recently I’ve been learning a lot about Artificial Intelligence. I probably did not tell you in this Medium Blog of mine, that I have already had some experience in the topics related to it back in 2019. But at that time, I was just playing around with some libraries, and every time I had to make a closer contact to anything mathematical it scared me to the bones.

So, 2021 came, and I’m getting again to my road of learning it, at smaller rates, but having a great time. But, without more talking, let’s jump into it:

Linear Regression: What is it?

Think about this case for a second, we have 3 points (A, B & C), that are plotted into some point into space.

Our goal is to get a single line, that most approaches them. But how do we adjust it, or even better, how do we teach the machine, to adjust itself, on its own?

That is when Linear Regression comes into play.

It is a simple, yet very nice, mathematical question, but no exactly like the machine was actually learning anything, it is more like adapting itself, for this reason, it is possible that someone argue that this is not AI, but certainly is AI related, at least.

And How does this work?

Hehe, let’s get to the fun part. First of all, you gonna need some mathematical background, but not much, I’ll list them, but you don’t have to master anything, just have some notion about what these things are:

  • Derivatives
  • Functions
  • Linear Equations
  • Linear Algebra (points and lines)

Get your head around these concepts, and let’s go:

So, first of all, you have a function, a function that describes a line (wow). Remember that, one of the forms that a line can be described is:

f(x) = ax + b

So, plotting it, and assuming random values to a & b, we would get a graph like this:

If we had the points that we’d like to approach, then, how can we calculate our error? for how much we missed them? Well, let’s find out:

We can get the value of our function f(x) to any point, let’s say x = 1. And then subtract the actual value of the points we wanted to reach, so this way it would be:

y = yi

y being the value of f(x), and yi being the point itself.

But think about it, at some occasions, we could get this data wrong. maybe we have a negative point, and “minus minus” would get us a little bit wrong on out interpretations, so, we can do the following:

(y - yi)²

This way, we get always the exact absolute value.

And so, we can get to the conclusion that our cost function actually is sum((y - yi)²).

(I trully hope this was understadable).

Ok, and now?

We already know our cost function, now we have to use it do adapt line function to make it approach our goals (our points).

remember I said that we could change the values of a & b on the base function? (f(x) = ax +b)

Yeah, we can, but how should we?

Well, that is when derivatives comes into play. If you already know the concept, derivatives defines the variation rates of a function. So, we can change the a and the b of our base function, according to these rates, this way we’re getting closer and closer to zero (probably in real world won’t get to EXACTLY zero).

Now, we just use the derivatives to make the correct adjust to our variables.

So, think about the expression that generated the line:

reg = ax + b

then, we can set the values to be:

a = a - ([BIAS] * <DERIVATIVE-OF-COST-RELATIVE-TO-A>)

b = b - ([BIAS] * <DERIVATIVE-OF-COST-RELATIVE-TO-B>)

And the BIAS here, will be a constant that will measure how much adjustment we'll have, in other words, it sets a "weight" to it (usually the BIAS is a value between 0 and 1).

Conclusion

Maths is a incredible and exciting field, and it is so huge that it's nearly impossible to cover all of the details at once, but I trie to summarize the most important aspects of Linear Regression, in an easy way. Hope it helps! 😃

--

--

João Augusto Perin

I don’t know how to describe myself actually…maybe a hypothetical jedi? God no…that sounds like a bad album title from an underground 80’s band.