IA (III): Regression

Regression is one of the technices we can find in the supervised learning paradigm.

Let’s suppose we have some historic data about some alcohol effects trials participants, and we have some data about the amount of alcohol they have ingested before showing symptoms of drunkenness. In addition, we have some data about themselves like weight and height.

Now, we want to explore how I would use machine learning to predict how many alcohol can a person ingest before getting drunk.

When we need to predict a numeric value, like an amount of money or a temperature or, in this case, the number of mililiters, in this cases is when a supervised learning technique called regression is used.

Let’s take one of the participants in our study and check the data is interesting for us. And, let’s make it simple and take just age, weight, height and percentage of body fat.

What we want to, it is to find a model that can calculate the amount of alcohol a person can drink before to have symptoms of drunkenness.

Age: 30. Weight: 90 kg. Height: 180 cm. Body fat: 23%. Alcohol: 125 ml.

ƒ([30, 90, 180, 23]) = 125

So we need our algorithm to learn the function that operates of all of the participant features to give us a result of amount of alcohol in milimiters.

Of course, a sample of only one person is not likely to give us a function that generalizes well. We need to gather the same sort of data from lots of diverse participants and train our model based on this larger set of data.

ƒ([X1, X2, X3, X4]) = Y

After we have trained the model and we have a generalized function that can be used to calculate our label Y, we can then plot the values of Y, calculated for specific features of X values on a chart. And, we can interpolate any new values of X to predict and unknown Y.

Captura de pantalla 2018-06-16 a las 11.37.00

We can use part of our study data to train the model and withhold the rest of the data for evaluating model performance.

Now we can use the model to predict f of x for evaluation data, and compare the predictions or scored labels to the actual labels that we know to be true.

The result can have differences between the predicted and actual levels, these are what we call the residuals and they can tell us something about the level of error in the model.

Captura de pantalla 2018-06-16 a las 11.18.51

There are a few ways we can measure the error in the model, and these include root-mean-square error, or RMSE, and mean absolute error, or MAE. Both of these are absolute measures of error in the model.

RMSE = √(∑(score - label)^2)
MAE = 1/n ∑ abs(score - label)

For example, an RMSE value of 5 would mean that the standard deviation of error from our test error is 5 mililiters.

The problem is that absolute values can vary wildly depending on what you are predicting. An error of 5 in one model can mean nothing but in a different model can be a big difference. So we might want to evaluate the model using relative metrics to indicate a more general level of error as a relative value between 0 and 1.

Relative absolute error, or RAE, and relative squared error, or RSE, produce a metric where the closer to 0 the error, the better the model

RAE = ∑ abs(score - label) / ∑ label
RSE = √(∑ (score - label)^2) / ∑ label^2

And the coefficient of determination, which we sometimes call R squared, is another relative metric, but this time a value closer to 1 indicates a good fit for the model.

CoD (R^2) = 1 var(score - label) / var(label)

[Updated: Correction in the second error chart were there was a typo: MAE -> RSE. Thanks to Michael Mora for the comment]

IA (III): Regression

AI (I): Machine learning

Machine learning provides de foundation for artificial intelligence. So, what is it?

Machine learning is a technique in which we train a software model using data. The model learns from the training cases and then, we can use the trained model to make predictions for new data cases. To have a computer make intelligent predictions from the data, we just need a way to train it to perform the correct calculations.

We usually start with a data set that contains historical records, often called cases or observations. Each observation includes numeric features that quantify a characteristic of the item we are working with. We can call it ‘X’. In addition, we also have some value that we are trying to predict, we can call it ‘Y’. The purpose is to use our training cases to train a machine learning model so it can calculate a value for ‘Y’ from the features in ‘X’. As a simplification, we are creating a function that operates on a set of features ‘X’, to produce predictions ‘Y’.

Generally speaking, there are two broad kinds of machine learning, supervised and unsupervised.

In supervised learning scenarios, we start with observations called labels, that include known values for the variable we want to predict. The first thing we need to do, it is to split our data because we already know the label we are trying to predict. In this way, we can train the model using half of the data and keep the rest to test the performance of our model. When we obtain the desired results and we are confident our model works, we can use it with new observations for which the label is unknown, and generate new predicted values.

Unsupervised learning is different from supervised learning, in that this time we do not have known label values in the training data set. We train the model by finding similarities between the observations. After the model is trained, each new observation is assigned to the cluster of observations with the most similar characteristics.

AI (I): Machine learning

Artificial Intelligence: Type of environments

Let’s first describe what is an agent in artificial intelligence. An intelligent agent is an autonomous entity which observes through sensors and acts upon an environment using actuators and directs its activity towards achieving goals. Intelligent agents may also learn or use knowledge to achieve their goals. They may be very simple or very complex.

When designing artificial intelligence solutions we need to consider aspects such as the the characteristics of the data (classified, unclassified, …), the nature of learning algorithms (supervised, unsupervised, …) and the nature of the environment on which the AI solution operates. We tend to spend big amounts of time in the first two aspects but it turns out, that the characteristics of the environment are one of the absolutely key elements to determine the right models for an AI solution. Understanding the characteristics of the environment is one of the first tasks that we need to do. From this point of view we can consider several categories.

Fully vs Partial observable

An environment is called fully observable if what your agent can sense at any point in time is completely sufficient to make an optimal decision. For example, we can imagina a card game where all the cards are on the table, the momentary site of all those cards is really sufficient to make an optimal choice.

An environment is called partialy observable where you need memory on the side of the agent to make the best possible decision. For example, in the poker game the cards are not openly on the table, and memorizing past moves will help you make a better decision.

Deterministic vs Stochastic

A deterministic environment is one where your agent’s actions uniquely detemine the outcome. For example, in the chess game there is really no randomness when you move a piece, the effect of moving a piece is completely predetermined and, no matter where I am going to move the same piece, the outcome is the same.

A stochastic enviroments there is a certain amount of radomness involved. Games that involve a dice, are stochastic. While you can still deterministically move your pieces, the outcome of an action also involves throwing the dice, and you cannot predict it.

Discrete vs Continuous

A discrete environment is one where you have finitely many action choices, and finitely many things you can sense. For example, the chess has finitely many board positions and finately many things you can do.

A continuous environment is one where the space of possible actions or things you could sense may be infinite. In the game of dards, throwing a dard we have infinite ways to angle it and accelerate it.

Benign vs Adversarial

In benign environments, the environment might be random, it might be stochastic, but it has no objective on its own that would contradict the own objective. Weather is benign, it might be ramdon, it might affect the outcome of your actions but it is not really out there to get you.

In adversarial environemnts, the opponent is really out to get you. In the game of chess the enviroment has the goal of defeat you. Obviously, it is much harder to find good actions in adversarial environments where the opponent actively observes you and counteracts what you are trying to achieve than in benign environments.

I have seen a few more classifications or specifications but, more or less, all of them list the same categories or very similar categories.

Note: Article based on my notes of the course Intro to Artificial Intelligence | Udacity

Artificial Intelligence: Type of environments