Regression is one of the technices we can find in the supervised learning paradigm.
Let’s suppose we have some historic data about some alcohol effects trials participants, and we have some data about the amount of alcohol they have ingested before showing symptoms of drunkenness. In addition, we have some data about themselves like weight and height.
Now, we want to explore how I would use machine learning to predict how many alcohol can a person ingest before getting drunk.
When we need to predict a numeric value, like an amount of money or a temperature or, in this case, the number of mililiters, in this cases is when a supervised learning technique called regression is used.
Let’s take one of the participants in our study and check the data is interesting for us. And, let’s make it simple and take just age, weight, height and percentage of body fat.
What we want to, it is to find a model that can calculate the amount of alcohol a person can drink before to have symptoms of drunkenness.
Age: 30. Weight: 90 kg. Height: 180 cm. Body fat: 23%. Alcohol: 125 ml.
ƒ([30, 90, 180, 23]) = 125
So we need our algorithm to learn the function that operates of all of the participant features to give us a result of amount of alcohol in milimiters.
Of course, a sample of only one person is not likely to give us a function that generalizes well. We need to gather the same sort of data from lots of diverse participants and train our model based on this larger set of data.
ƒ([X1, X2, X3, X4]) = Y
After we have trained the model and we have a generalized function that can be used to calculate our label Y, we can then plot the values of Y, calculated for specific features of X values on a chart. And, we can interpolate any new values of X to predict and unknown Y.

We can use part of our study data to train the model and withhold the rest of the data for evaluating model performance.
Now we can use the model to predict f of x for evaluation data, and compare the predictions or scored labels to the actual labels that we know to be true.
The result can have differences between the predicted and actual levels, these are what we call the residuals and they can tell us something about the level of error in the model.

There are a few ways we can measure the error in the model, and these include root-mean-square error, or RMSE, and mean absolute error, or MAE. Both of these are absolute measures of error in the model.
RMSE = √(∑(score - label)^2)
MAE = 1/n ∑ abs(score - label)
For example, an RMSE value of 5 would mean that the standard deviation of error from our test error is 5 mililiters.
The problem is that absolute values can vary wildly depending on what you are predicting. An error of 5 in one model can mean nothing but in a different model can be a big difference. So we might want to evaluate the model using relative metrics to indicate a more general level of error as a relative value between 0 and 1.
Relative absolute error, or RAE, and relative squared error, or RSE, produce a metric where the closer to 0 the error, the better the model
RAE = ∑ abs(score - label) / ∑ label RSE = √(∑ (score - label)^2) / ∑ label^2
And the coefficient of determination, which we sometimes call R squared, is another relative metric, but this time a value closer to 1 indicates a good fit for the model.
CoD (R^2) = 1 var(score - label) / var(label)
[Updated: Correction in the second error chart were there was a typo: MAE -> RSE. Thanks to Michael Mora for the comment]