Now, it is time to start digging in the theory of Machine Learning.
In the machine learning world, precision is everything. When we try to develop a model, we try to make it as much accurate as possible playing with the different parameters. But, the hard truth is that we can not build a one-hundred per cent accurate model due to we can not build a free of errors model. What we can do, it is trying to understand the possible sources of errors and this will help us to obtain a more precise model.
Types of errors
When we are talking about errors, we can find reducible and irreducible errors.
Irreducible errors are errors that cannot be reduced no matter what algorithm you apply. They are usually known as noise and, the can appear in our models due to multiple factors like an unknown variable, incomplete characteristics or a wrongly defined problem. It is important to mention that, no matter how good is our model, our data will always have some noise component or irreducible errors we can never remove.
Reducible errors have two components – bias and variance. This kind of errors derivate from the algorithm selection and the presence of bias or variance causes overfitting or underfitting of data.
Bias
Bias error is the difference between the expected prediction of our model and the real values or, saying it in a different way, how far are the predicted values from the actual values. High bias, predicted values are far off from the actual values, causes the algorithm to miss the relevant relationship between the input and output variable. When a model has a high bias then it implies that the model is too simple and does not capture the complexity of data thus underfitting the data. For example, if we try to adjust a linear regression to a set of data that has a non-linear pattern.
High bias implies that the model is too simple and does not capture the complexity of data thus underfitting the data. As examples, we have linear regression algorithms, logistic regression or linear discriminant analysis.
Low bias implies the opposite and it offers more flexibility. As examples, we have decision trees, k-nearest neighbour (KNN) and vector support machines.
Variance
It refers to the differences in the estimation of the function using different training data or, saying it in a different way, it tells us how scattered is the predicted value from the actual value. Variance occurs when the model performs well on the trained dataset but does not do well on a dataset that it is not trained on. Ideally, the result should not change too much from one set of data to another.
High variance causes overfitting that implies that the algorithm models random noise present in the training data, or that the algorithm is strongly dependent on the input data. It suggests big changes in the estimation of the function when the data changes. As an example, we have decision trees, k-nearest neighbour (KNN) and vector support machines.
Low variance suggests small changes in the estimation of the function when the data changes. As examples, we have linear regression, analysis of discrete linear systems and logic regression.
Bias–variance tradeoff
The objective of any machine learning algorithm is to achieve low bias and low variance, achieving at the same time a good performance predicting results. The bias-variance dilemma or bias-variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set. The bias opposite to the variance refers to the precision opposite to consistency of the trained models. Considering the combinations we can have:
- High Bias Low Variance: Models are consistent but inaccurate on average. Tend to be less complex with a simple or rigid structure like linear regression or Bayesian linear regression.
- Low Bias High variance: Models are somewhat accurate but inconsistent on averages. Tend to be more complex with a flexible structure like decision trees or k-nearest neighbour (KNN).
- High Bias High Variance – Models are inaccurate and also inconsistent on average.
- Low Bias Low Variance: This is the unicorn.
To build a good model we need to find a good balance between bias and variance that help us to minimise the total error. This is why to understand the bias and variance are fundamental to understand the model’s behaviour.
Detecting high bias or high variance
High Bias can be identified when we have:
- High training error.
- Validation error or test error is the same as training error.
High Variance can be identified when:
- Low training error.
- High validation error or high test error.
Fixing it
High bias is due to a simple model and we also see a high training error. To fix that we can do the following things:
- Add more input features.
- Add more complexity by introducing polynomial features.
- Decrease Regularization term.
High variance is due to a model that tries to fit most of the training dataset points and hence gets more complex. To resolve the high variance issue we need to work on:
- Getting more training data.
- Reduce input features.
- Increase Regularization term.
That is all for today. I hope the first theory article was not to hard to read. I will try to make them not too long and as concise as possible.