The Machine Learning process

To build our machine learning system, despite the field where we want to apply it, we need to follow a similar process or steps. Every step is important and the quality we achieve in every one of them will affect the quality of the whole system at the end.

Depending on the literature we are checking, these steps received one name or another. This list I present you here is just one of the ways of describing the process but, I hope, with the short descriptions, we will be able to match them with any other version out there.

Understanding the problem

The first thing we need to define is what we want to achieve when implementing our machine learning system. We should define what is the problem we want to solve and what is the objective at the end.

It is not just the definition of these two points, we should apply a context too. What resources do we have, what costs and benefits the project is going to have and the kind of criteria we should evaluate every time we start a new project.

Understanding the data

By now, if we are stating a machine learning project, we should know that the data is one of the most important things we need if it is not the most important thing. This step includes two points around data:

  • Gathering data: We need to identify our data sources or, if they do not exist, how we are going to generate the data. We need to define how we are going to collect and store the data, this usually involved writing some kind of code. And we need to define how we are going to integrate the data especially if we are gathering data from multiple sources.
  • Exploring data: We need to do a preliminary exam of the data, decided what data we are going to use, if there is something it is already calling our attention and if the data is going to allow us to keep progressing. For example, if we are building a classification system but we are missing data from one or more of the classes or the data is not enough for one of them, we should realise here and try to solve it.

Pre-processing the data

After collecting the data we need to normalise the data to be allowed to process it trying to achieve optimum results. Removing null values or finding a common scale for numeric values are two common tasks applied here. Another important task we should be performed here is to anonymize the data to comply with any data protection legislation.

Extracting the characteristics

This is one of the most important steps of the process. We should bring here some experts (if we do not have them on our team) to help us to define the characteristics that we are going to use in our model and that they are going to help us to solve the problem. We need to identify these characteristics in our data. For example, for a property valuation model things like the number of bedrooms, the location, the size, how old the property is. All these characteristics will help us to solve our problem.

Selecting the characteristics

Once we have extracted a list of characteristics, we need to find a balance between them and the cost. Computational resources are expensive, the more characteristics we try to process the more expensive is going to be to process them and the less performance we are going to achieve. The challenge here will be to select the fewer characteristics as possible without affecting the final result or affecting it as less as possible. Ideal candidates to be removed are irrelevant, redundant or correlated characteristics.

There are three main types of algorithms to select characteristics:

  • Wrappers: They are linked to the algorithm we are going to use, they evaluate the efficiency of including new characteristics. The downside of them, it is that they consume a big amount of resources.
  • Filters: They are independent of the algorithm we are going to use, they use mathematical and statistical techniques to help us to select characteristics. They consume fewer resources than the previous case.
  • Hybrids: They are built during the training step, this is why they are a mix of both previous categories.

Training the algorithm

In this step is when the algorithm starts learning and the model is built. During training, the algorithm learns the model. We need to decide what approach we are going to take (classification, regression, …) and, depending on this, we will choose the modelling technique we are going to use or, maybe, we will try some of them. And, finally, build our model. We are going to do any necessary tweaks too.

Evaluating the algorithm

Once the model is finished and our algorithm is ready we need to evaluate how accurate is. This is usually done with some data we have reserved and never use to train the algorithm. With this data, we check if the predictions done by the algorithm are good enough.

Analyzing the results

Now that we have results, we need to check them against the success criteria defined initially. Here we should include not just the accuracy, we should check if the solution is inside our restrictions, performance, costs, …

If everything aligns, we will proceed to the next step, if it does not, we need to review the process and the taken decisions based on the results we have.

Deploying the system

This is the wider step. Notify results, generate reposts, generate documentation, make the system available to required users, everything we else we can think around these areas needs to be done and any other procedural or administrative task to be able to add the seal of done to our system.

As I have said before, you will find these steps with different names probably in other books, articles and publications but I hope that with the little explanations we are going to match them and reference them.

See you.

The Machine Learning process

IA (III): Regression

Regression is one of the technices we can find in the supervised learning paradigm.

Let’s suppose we have some historic data about some alcohol effects trials participants, and we have some data about the amount of alcohol they have ingested before showing symptoms of drunkenness. In addition, we have some data about themselves like weight and height.

Now, we want to explore how I would use machine learning to predict how many alcohol can a person ingest before getting drunk.

When we need to predict a numeric value, like an amount of money or a temperature or, in this case, the number of mililiters, in this cases is when a supervised learning technique called regression is used.

Let’s take one of the participants in our study and check the data is interesting for us. And, let’s make it simple and take just age, weight, height and percentage of body fat.

What we want to, it is to find a model that can calculate the amount of alcohol a person can drink before to have symptoms of drunkenness.

Age: 30. Weight: 90 kg. Height: 180 cm. Body fat: 23%. Alcohol: 125 ml.

ƒ([30, 90, 180, 23]) = 125

So we need our algorithm to learn the function that operates of all of the participant features to give us a result of amount of alcohol in milimiters.

Of course, a sample of only one person is not likely to give us a function that generalizes well. We need to gather the same sort of data from lots of diverse participants and train our model based on this larger set of data.

ƒ([X1, X2, X3, X4]) = Y

After we have trained the model and we have a generalized function that can be used to calculate our label Y, we can then plot the values of Y, calculated for specific features of X values on a chart. And, we can interpolate any new values of X to predict and unknown Y.

Captura de pantalla 2018-06-16 a las 11.37.00

We can use part of our study data to train the model and withhold the rest of the data for evaluating model performance.

Now we can use the model to predict f of x for evaluation data, and compare the predictions or scored labels to the actual labels that we know to be true.

The result can have differences between the predicted and actual levels, these are what we call the residuals and they can tell us something about the level of error in the model.

Captura de pantalla 2018-06-16 a las 11.18.51

There are a few ways we can measure the error in the model, and these include root-mean-square error, or RMSE, and mean absolute error, or MAE. Both of these are absolute measures of error in the model.

RMSE = √(∑(score - label)^2)
MAE = 1/n ∑ abs(score - label)

For example, an RMSE value of 5 would mean that the standard deviation of error from our test error is 5 mililiters.

The problem is that absolute values can vary wildly depending on what you are predicting. An error of 5 in one model can mean nothing but in a different model can be a big difference. So we might want to evaluate the model using relative metrics to indicate a more general level of error as a relative value between 0 and 1.

Relative absolute error, or RAE, and relative squared error, or RSE, produce a metric where the closer to 0 the error, the better the model

RAE = ∑ abs(score - label) / ∑ label
MAE = √(∑ (score - label)^2) / ∑ label^2

And the coefficient of determination, which we sometimes call R squared, is another relative metric, but this time a value closer to 1 indicates a good fit for the model.

CoD (R^2) = 1 var(score - label) / var(label)
IA (III): Regression

AI (I): Machine learning

Machine learning provides de foundation for artificial intelligence. So, what is it?

Machine learning is a technique in which we train a software model using data. The model learns from the training cases and then, we can use the trained model to make predictions for new data cases. To have a computer make intelligent predictions from the data, we just need a way to train it to perform the correct calculations.

We usually start with a data set that contains historical records, often called cases or observations. Each observation includes numeric features that quantify a characteristic of the item we are working with. We can call it ‘X’. In addition, we also have some value that we are trying to predict, we can call it ‘Y’. The purpose is to use our training cases to train a machine learning model so it can calculate a value for ‘Y’ from the features in ‘X’. As a simplification, we are creating a function that operates on a set of features ‘X’, to produce predictions ‘Y’.

Generally speaking, there are two broad kinds of machine learning, supervised and unsupervised.

In supervised learning scenarios, we start with observations called labels, that include known values for the variable we want to predict. The first thing we need to do, it is to split our data because we already know the label we are trying to predict. In this way, we can train the model using half of the data and keep the rest to test the performance of our model. When we obtain the desired results and we are confident our model works, we can use it with new observations for which the label is unknown, and generate new predicted values.

Unsupervised learning is different from supervised learning, in that this time we do not have known label values in the training data set. We train the model by finding similarities between the observations. After the model is trained, each new observation is assigned to the cluster of observations with the most similar characteristics.

AI (I): Machine learning

Machine learning branches

In machine learning we can find three main different branches where we can classify the algorithms:

  • Supervised learning.
  • Unsupervised learning.
  • Reinforcement learning.

Supervised learning

In supervised algorithms you know the input and the output that you need from your model. You do not know how the output is achieved from the input data or how are the inner relations among you data, but definitely know the output data.

As an example, we can take a magazine publication that it has the subscription data of a determinate number of customers or old customers, let’s say 100.000 customers. The company in charge of the magazine knows that half of these customers (50.000) have cancelled their subscriptions and the other half (50.000) are still subscribed, and they want a model to predict what customers will cancel their subscriptions.

We know the input: customers subscription data, and the output: cancelled or not.

We can then build our training data set with 90.000 customers data. Half of them cancelled and half of them still active. We will train our system with this training set. And after that we will try to predict the result for the other 10.000 we left outside the training data to check the accuracy of our model.

Unsupervised learning

In unsupervised learning algorithms you do not know what is the output of your model, you maybe know there is some kind of relation or correlation in your data but, maybe, the data is too complex to guess.

In this kind of algorithms, you normalize your data in ways that it can be compared and you wait for the model to find some of these relationships. One of the special characteristics of these models is that, while the model can suggest different ways to categorize or order your data, it is up to you to make further research on these to unveil something useful.

For example, we can have a company selling a huge number of products and they want to improve their system to target customers with useful advertisement campaigns. We can give to our algorithm the customers data and the algorithms can suggest some relations: age range, location, …

Reinforcement learning

In reinforcement learning algorithms, they do not receive immediately the reward for their actions, and they need to accumulate some consecutive decision to know if the actions/decisions are or not correct. In this scenario, there is no supervisor, the feedback about the decision is delayed and agent’s actions affect the subsequent data it receives.

One example of this, it can be the chess game, where the algorithm is going to be taking decisions but, till the end of the game, it is not going to be able to know if these decisions were correct or not and, obviously, previous decisions affect subsequent decisions.

Machine learning branches

Machine learning vs deep learning

AI (artificial intelligence) has been one of the buzzwords in the recent years and, it looks like it is going to continue been like that for, at least, a few more. Leaving on a side terminologies, the truth is that this kind of technologies offer great opportunities. Lately, there are two terms that we can listen quite often and almost everywhere. These two terms, related with AI, are:

  • Machine learning
  • Deep learning

Both terms are related, in fact, we can say the machine learning involves deep learning. Both technologies refer systems that can learn by themselves, the difference is the way they learn. As a quick explanation, we can say that deep leaning is more complex, more sophisticated and more autonomous. Once the deep leaning system is implemented the need for human intervention is minimal.

Machine learning

The main characteristic that differentiate these systems from other less advanced is the ability to learn by themselves. In this way, the system algorithm receives a set of rules to apply to the data but, the special thing about this kind of system is they can adapt these rules or develop new ones to increase the successful rate.

For example, let’s say we write a system to identify cat pictures (Internet loves cat pictures, we know). We can ask the system to identify some patterns: four legs, hair, nose, ears, tail, two eyes… All characteristics that usually cats have. After that, we can train the algorithm with a training set pointing to the system if we can find a cat or not. With this action we allow the system to create or adapt their own rules to make easier the task when new and unknown pictures will be given.

Deep learning

The difference between machine learning and deep learning is that the second one takes the learning part to a more advanced level. In this case, the system has layers or neuronal units trying to imitate the brain’s behavior.

In deep learning, each layer process the information and return a result as a percentage. For example, this picture has a 87% change to be a cat and a 13% change to not. The next layer analyzing the image will take this value and it will combine it with its own value. With this, the percentage will vary and, this new value will be sent to the next layer to perform a similar process. This process will continue layer after layer.

All these consecutive analysis performed by the different layers reduce the error rate and increase the number of correct conclusions. To train the system we will use again a training set and, specially in this case, the bigger, the better.

Machine learning vs deep learning