Docker: Parent context

Sometimes when building Docker images, if we have custom files or resources we want to use and multiple images are going to use them, it is interesting to have then just once in a shared folder and to have all Dockerfile commands pointing to that shared folder.

The problem with this is that, when we are running a build, docker use as a context the current directory where the docker build command runs and, this, initially, can cause some fails.

Let’s see an example of this. Let’s imagine we have a few projects with a structure like this:

projectA -> DockerfileA
         -> fileA
         -> resources -> script.sh
         -> resources -> certs -> cert
projectB -> DockerfileA
         -> fileB
         -> resources -> script.sh
         -> resources -> certs -> cert
projectC -> DockerfileA
         -> fileC
         -> resources -> script.sh
         -> resources -> certs -> cert

In the above folder structure, we have three different projects and we can see we have some duplicated resources making more difficult to maintain and been error-prone. For this reason, we decide to restructure the folders and shared resources in something like:

projects -> projectA -> DockerfileA
         -> projectA -> fileA
         -> projectB -> DockerfileB
         -> projectB -> fileB
         -> projectC -> DockerfileC
         -> projectC -> fileC
         -> resources -> script.sh
         -> resources -> certs -> cert

As we can see, the above structure seems easier to maintain and less error-prone. The only consideration we need to have now is the docker build context.

Let’s say our Dockerfile files, among others, has the ADD or COPY commands inside. In this case, something like:

...
COPY ./resources/script.sh /opt/
COPY ./resources/certs/cert /opt/cert
...

Building docker images under the first folder structure is something like:

(pwd -> ~/projectA)
$ docker build -t projectA .

(pwd -> ~/projectB)
$ docker build -t projectB .

(pwd -> ~/projectC)
$ docker build -t projectC .

But, if we try to do it in the same way using the second folder structure we are going to be prompted with an error:

COPY failed: Forbidden path outside the build context.

This is because the context when executing the docker command does not have access to the folders allocated in the parent folder.

To avoid this problem, the only thing we need to do is to execute the build command from the parent folder and to add an extra flag (-f) to our command to point to the Dockerfile file we want to use when building the image. Something like:

(pwd -> projects)
$ docker build -t projectA -f projectA/DockerfileA .
$ docker build -t projectB -f projectB/DockerfileB .
$ docker build -t projectC -f projectC/DockerfileC .

This should solve the problem as now, the context of the docker build command is the parent folder.

Docker: Parent context

Fake news

We are living unfortunate times. A pandemic is ravaging all countries and the population has had to take extreme measures such as confinement and physical distancing.

Luckily, nowadays, we have technologies like the Internet, messaging apps, videoconference apps, social networks and others that allow us to practice physical distancing but avoiding social distancing. Unluckily, like any other tools, they can be used for good and for bad.

One of this bad uses we can see, especially right now, is the publication and expansion of Fake News and False Information, with all the danger, uncertainty and public opinion manipulation they bring to the table.

The term Fake News is closely associated with politics while the term False Information is referred to a diverse range of disinformation covering topics such as health, environmental and economics across all platforms and genres.

False information is not new, however, it has become a hot topic in the last few years and, on this health crisis, it has become more evident than never. The increase in the use of social media and messaging apps these days due to lockdowns and isolation can cause an overload of information and make more difficult to tell whether stories are credible or not. In addition, not just the increase of use of these platforms but the lack of knowledge about how the Internet works have caused the spread of all kinds of false information.

There are different types of false information based on the intention the pursue:

  • Clickbait: Designed and written to attract more visitors to a website to, usually, monetize this increase of traffic and based on sensationalist headlines sacrificing truth and/or accuracy.
  • Propaganda: Stories designed to mislead or provided biased points of view.
  • Satire/Parody: Created for pure entertainment or parody. Despite the intention is clear on the origin, when they get divulgated and lose context they can mislead audiences.
  • Sloppy Journalism: Journalism is a serious job and proper investigations and verifications need to be done before publishing a story. When journalists take shortcuts or they do not verify their sources misleading or wrong stories can be published.
  • Biased News: Social media personalisation algorithms can publish more often news in user’s feeds that are more aligned with their ideology and thoughts (how this is done, it is a completely different topic). This can cause users believing and spreading misleading news.

It is more important now that ever to exercise and enforce our critical thinking and do not blindly trust everything we see and is shared with us through messaging groups and social networks. Especially, because every time we share or resend false information we are legitimating it to the eyes of the people that know us (well, not everyone, all of us have this friend that…you know).

There are a few things and questions we can consider before sharing something or “legitimate” somethings:

  • Be suspicious of any information that is very scandalous or emotional. Put it in quarantine before taking it for granted. Find out where it came from and, above all, do not send it until you are sure it is true.
  • Often, fake news carries fake signatures from recognized journalists, and even covers and media headlines are tricked into making it appear true. Again, before spreading it, look for that news on the website of the media. Check that they have actually published it.
  • Be suspicious of any message that includes phrases such as “it’s true”, “I’ve verified it” “it’s an aunt/niece of mine”, etc. They are usually phrases from false WhatsApp chains.
  • Many audio and video messages from alleged experts are used to mislead people. Search their names on the internet before giving them up for truth if the information does not come from a reliable means of communication.
  • Many photos are manipulated. Before taking them for granted, make sure they are not. These are tools that allow you to check it such as Google Reverse Image Search, it will help to find the original source of the news and its first publication date.
  • Visit specialized in denying hoaxes platforms whenever it is possible.
  • And, use common sense, check before forwarding and if something does not smell good probably is not good.

I hope everyone is doing well, stay safe and, for everyones benefit, stay sharp and cautious when you publish or re-publish information out there. Knowledge is power but, the correct one.

Fake news

Builder pattern + inheritance

In general, it is very simple to implement in Java the builder pattern, a few lines of code and the problem is solved but, when we are using inheritance, it is not as intuitive apparently as it should be. I have lately seen poor attempts of doing it and not achieving the desired result.

In this article, we are going to build a very simple example of that.

public class Parent {

    private final String a;

    protected Parent(final Builder<?> builder) {
        this.a = builder.a;
    }

    public String getA() { return a; }

    public static class Builder<T extends Builder<T>> {

        private String a;

        public T a(final String a) {
            this.a = a;
            return (T) this;
        }

        public Parent build() {
            return new Parent(this);
        }
    }
}

In this first class, the parent class, we can see we are using generics to allow child classes to pass their builders.

public class Children extends Parent {

    private final String b;

    protected Children(final Builder builder) {
        super(builder);
        this.b = builder.b;
    }

    public String getB() {
        return b;
    }

    public static Builder builder() {
        return new Builder();
    }

    public static class Builder extends Parent.Builder<Builder> {
        private String b;

        public Builder b(final String b) {
            this.b = b;
            return this;
        }

        public Children build() {
            return new Children(this);
        }
    }
}

Here, we can see how we pass the child builder in the diamond operator, this will allow us to add values for the properties to the patent and to the child using the builder.

public class Main {

    public static void main(String[] args) {
        final Children children = Children.builder()
            .a("Hi")
            .b("Bye")
            .build();

        System.out.println(children.toString());
    }
}

Here, we can see how to use the builder. Thanks to the generics, the call to .a(“Hi”) returns a child builder and not a parent builder what it would make impossible to call .b(“Bye”).

I hope it is useful.

Builder pattern + inheritance

Detecting a phishing email

Christmas and New Year are usually happy moments, families, people, lights on the streets, ex-pats flying home, gifts… but, it is a very good season for phishing emails too, in both environments, personal and enterprise.

This article is just a collection of rules, more focus on enterprise environments but applicable to both, to try to educate our employees or ourselves to prevent ransomware infections or any other infection received by email. They are not golden rules, just some basic guidelines to follow by email users.

  • Do not trust the displayed name of who the email is from: Just because it says it is coming from someone you know or trust does not mean that it truly is. Check the email address to confirm the real sender. Different email clients have different ways to do this but, basically, it is something that it can be done with just one click.
  • Check the email signature: Usually, legitimate enterprise users include a full signature block at the bottom of their emails. If it is there, check if it is correct. If it is from someone that you have exchange previous emails and suddenly the signature is not there, be suspicious.
  • Consider the salutation: People tend to address the person is sending the email to. If the salutation is vague, or generic i.e. “valued customers” or just addressing the recipient by title i.e. “Dear Accountant”, be suspicious.
  • Check for spelling errors: All of us make mistakes when writing, some people write in a language that it is not theirs but, in general, we have autocorrection (not always for good) and people concern about spelling and be grammatically correct. Attackers usually are careless about this kind of details.
  • Double-check the links: Hover or mouse over the different links on the email before you click. If the text showed looks strange or does not match what the link description says, do not click on it.
  • Is the email asking for personal information?: Legitimate companies are unlikely to ask for personal information by email. In some cases, they actively remind you about this. As an example, I am sure everyone here has received these emails from the bank reminding you they will never ask you anything by email.
  • Be careful with attachments: If you have any doubts about the email do not click on the attachments, it does not matter how legit they look or the nice name they have. Contact the sender of the email, if possible, to confirm the legitimacy.
  • Beware of urgency: Emails like this, sometimes, try to push some sense of urgency to push recipients to be unwise and focus on what the email says and ignore the warning signals. Do not do that, take your time (it is going to take a minute) to check the email or do a few basic checkings about the legitimacy. As an example, the typical email from the CEO to the accountant “Hi, I am John Doe (CEO), I need you to transfer 1 million to xxxxxx or we are going to lose the deal…”
  • Better safe than sorry: If you see some signals that make you doubt of the legitimacy of an email, contact your SOC if you have one, the sender or use your common sense.

As I have said, just a few basic and common-sense advice that we, sometimes, forget.

Detecting a phishing email

Remote Tail

This article is just a quick code snipped for bash shell to allow us to tail log files from a remote endpoint.

Nowadays, we are building plenty of microservices and the common pattern is to aggregate them using tools like Kibana to store and search them with the help of some correlation ids. Despite, this is a very good solution, sometimes we do not need anything that fancy.

In Spring Actuator, we can find the endpoint “/logfile” that it has proven a lot of times to be pretty useful. This endpoint allows us to recover the log file from the server. We can download it or just check it on the browser. The problem is when this logfile reach a size that our browser can not manage properly. We can use tools like “wget” to download the log file and analyse it locally but it seems absurd to download the whole file every time we want an update.

I have a different proposal. With a few lines of shell scripting, we can write a snipped to tail the log file into a local file, and we can use “less” to monitor it or perform searches on the file.

#!/bin/bash

#
# Check if the given server support HTTP range header
# param 1: url
#
function check_ranges_support() {
  ret=`curl -s -I -X HEAD $1 | grep "Accept-Ranges: bytes"`

  if [ -z "$ret" ]; then
    echo "Ranges are nor supported by the server"
    exit 1
  fi
}

#
# Recovers the total length of the given file
# param 1: url
#
function get_length() {
  ret=`curl -s -I -X HEAD $1 | awk '/Content-Length:/ {print $2}'`
  echo $ret | sed 's/[^0-9]*//g'
}

#
# Print the requested part of the remote file
# param 1: url
# param 2: off
# param 3: len
# param 4: output file
#
function print_to_logfile() {
  curl --header "Range: bytes=$2-$3" -s $1 >> $4
}

#
# Clean the previous log file
#
function clean_logfile() {
  rm -f $1
}

# call validation
if [ $# -lt 1 ]; then
  echo "Syntax: remote-tail.sh <URL> [<logfile name>]"
  exit 1
fi

url=$1
offset=0
logfile=tmplog

if [ $# -eq 2 ]; then
  logfile=$2
fi

check_ranges_support $url
clean_logfile $logfile

len=`get_length $url`
off=$((len - offset))

until [ "$off" -gt "$len" ]; do
  len=`get_length $url`

  if [ "$off" -eq "$len" ]; then
    sleep 5 # we refresh every 5 seconds if no changes
  else
    sleep 1 # we refresh every second to not hammer too much the server
    print_to_logfile $url $off $len $logfile
  fi

  off=$len
done

We can use it with:

./remote-tail.sh https://server/logfile

I hope it helps.

Remote Tail

ML – Python (VII) – Overfitting & Underfitting

On the previous article, we named overfitting and underfitting but we did not go deep into details about them. Let’s just take a deeper dive on them.

When we work with a set of data to predict or classify a problem we try to achieve our goals implementing a model using the training data and testing it with the testing data. We can make adjustments based on the characteristics we are using or the model itself.

Modifying the model we can end up with a too simple model or a too complex model. Here is when we need to consider the overfitting and underfitting concepts.

Underfitting

As we can see on the image, the underfitting concept refers to a model that can neither model the training data nor generalize to new data. An underfit machine learning model is not a suitable model and will be obvious as it will have poor performance on the training data.

It happens when we do not have enough data to build a precise model or when we try to build a linear model with non-linear data.

There are a few techniques we can try to prevent underfitting:

  • Sometimes the model is underfitting because the feature items are insufficient. In this case, we can add other feature items to unfold it well.
  • Add polynomial features, which are usually utilized as a part of the machine learning algorithm. For example, the linear model is more generalized by adding quadratic or cubic terms.
  • Reduce the regularization parameters. The motivation behind regularization is to prevent overfitting, yet now the model has an underfitting, we have to diminish the regularization parameters.

Overfitting

On the opposite side, the overfitting concept refers to a model that models the training data too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.

Overfitting is more probable in non-parametric and non-linear models.

There are a few techniques we can try to prevent overfitting:

  • Cross-validation: It uses our initial training data to generate multiple mini train-test splits, and it uses these splits to tune our model. Cross-validation allows us to tune hyperparameters with only our original training set. This allows us to keep our test set as a truly unseen dataset for selecting our final model.
  • Train with more data: Training with more data can help algorithms detect the signal better but, if we just add more noisy data, this technique won’t help. That’s why we should always ensure our data is clean and relevant.
  • Remove features: We can manually improve algorithms generalizability by removing irrelevant input features. The criteria to remove them, if anything does not make sense, or if it is hard to justify, this is a good candidate to be removed.
  • Early stopping: When training an algorithm, we can measure how well each iteration of the model performs. Up until a certain number of iterations, new iterations improve the model. After that point, the model’s ability to generalize can weaken as it begins to overfit the training data. Early stopping refers to stopping the training process before the learner passes that point.
  • Regularization: Regularization refers to a broad range of techniques for artificially forcing our model to be simpler.
  • Ensembling: Ensembles are machine learning methods for combining predictions from multiple separate models.

The good one

Finally, looking at the middle graph it shows a pretty good predicted line. It covers the majority of the points in graph and also maintains the balance between bias and variance.

That is all for today.

ML – Python (VII) – Overfitting & Underfitting