Diagrams as Code

I imagine that everyone reading this blog should be, by now, familiar with the term Infrastructure as Code (IaC). If not because we have written a few articles on this blog, probably because it is a widely extended-term nowadays.

At the same time, I assume familiarity with the IaC term, I have not heard a lot of people talking about a similar concept Diagram as Code (DaC) but focus on diagrams. I am someone that, when I arrive at a new environment, finds diagrams very useful to have a general view of how a new systems works or, when given explanations to a new joiner. But, at the same time, I must recognise sometimes I am not diligent enough to update them or, even worst, I arrived at projects where, or they are too old to make sense or there are none.

The reasons for that can be various, it can be hard for developers to maintain diagrams, lack of time, lack of knowledge on the system, not obvious location of the editable file, only obsolete diagrams available and so on.

I have been playing lately with the DaC concept and, with a little bit of effort, all new practices require some level of it till they are part of the workflow, it can fill this gap and help developers and, other profiles in general, to keep diagrams up to date.

In addition, there are some extra benefits of using these tools such as easy version control, the need for only a text editor to modify the diagram and, the ability to generate the diagrams everywhere, even, as part of our pipelines.

This article is going to cover some of the tools I have found and play with it lately, the goal is to have a brief introduction to the tools, be familiar with their capabilities and restrictions and, try to figure out if it can be something introduced on our production environments as a long term practise. And, why not, compare which tool offers the most eye-catching ones, in the end, people are going to pay more attention to these ones.

Just a quick note before we start, all the tools we are going to see are open source tools and freely available. I am not going to explore any payment tool and, I am not involved in any way in the tools we are going to be using.

Graphviz

The first tool we are going to see is Graphviz. In advance, I am going to say, we are not exploring this option in depth because it is too low level for my taste or the purpose we are trying to achieve. It is a great solution if you want to generate diagrams as part of your applications. In fact, some of the higher-level solution we are going to explore using these tools to be able to generate the diagrams. With that said, they define themselves as:

Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains.

The Graphviz layout programs take descriptions of graphs in a simple text language, and make diagrams in useful formats, such as images and SVG for web pages; PDF or Postscript for inclusion in other documents; or display in an interactive graph browser. Graphviz has many useful features for concrete diagrams, such as options for colors, fonts, tabular node layouts, line styles, hyperlinks, and custom shapes.

https://graphviz.org

The installation process in an Ubuntu machine is quite simple using the package manager:

sudo apt install graphviz

As I have said before, we are not going to explore deeper these tools, we can check on the documentation for some examples built in C and, the use of the command line tools it offers but, for my taste, it is a too low-level solution to directly use it when implementing DaC practises.

PlantUML

The next tool is PlantUML. Regardless of what the name seems to imply, the tool is able to generate multiple types of diagrams, all of them listed on their web page. Some examples are:

The diagrams are defined using a simple and intuitive language and, images can be generated in PNG, SVG or, LaTeX format.

They provide an online tool it can be used for evaluation and testing purposes but, in this case, we are going to be installing it locally, especially to test how hard it is and how integrable is in our CI/CD pipelines as part of our evaluation. By the way, this is one of the tools it uses Graphviz behind the scenes.

There is no installation process, the tool just needs the download of the corresponding JAR file from their download page.

Now, let’s use it. We are going to generate a State Diagram. It is just one of the examples we can find on the tool’s page. The code used to generate the example diagram is the one that follows and is going to be stored in a TXT file:

@startuml
scale 600 width

[*] -> State1
State1 --> State2 : Succeeded
State1 --> [*] : Aborted
State2 --> State3 : Succeeded
State2 --> [*] : Aborted
state State3 {
  state "Accumulate Enough Data\nLong State Name" as long1
  long1 : Just a test
  [*] --> long1
  long1 --> long1 : New Data
  long1 --> ProcessData : Enough Data
}
State3 --> State3 : Failed
State3 --> [*] : Succeeded / Save Result
State3 --> [*] : Aborted

@enduml

Now, let’s generate the diagram:

java -jar ~/tools/plantuml.jar plantuml-state.txt

The result is going to be something like:

State Diagram generated with PlantUML

As we can see, the result is pretty good, especially if we consider that it has taken us around five minutes to write the code without knowing the syntax and we have not had to deal with positioning the element on a drag and drop screen.

Taking a look at the examples, the diagrams are not particularly beautiful but, the easy use of the tool and the variety of diagrams supported makes this tool is a good candidate for further exploration.

WebSequenceDiagrams

WebSequenceDiagrams is just a web page that allows us to create in a quick and simple way Sequence Diagrams. It has some advantages such as offering multiple colours, there is no need to install anything and, having only one purpose, it covers it quite well in a simple way.

We are not going to explore this option further because it does not cover our needs, we want more variety of diagrams and, it does not seem integrable on our daily routines and CI/CD pipelines.

Asciidoctor Diagram

I assume everyone is more or less aware of the existence of the Asciidoctor project. The project is a fast, open-source text processor and publishing toolchain for converting AsciiDoc content to HTML5, DocBook, PDF, and other formats.

Asccidoctor Diagram is a set of Asciidoctor extensions that enable you to add diagrams, which you describe using plain text, to your AsciiDoc document.

The installation of the extension is quite simple, just a basic RubyGem that can be installed following the standard way.

gem install asciidoctor-diagram

There are other options of usage but, we are going to do an example using the terminal and, using the PlantUML syntax we have already seen.

[plantuml, Asciidoctor-classes, png]     
....
class BlockProcessor
class DiagramBlock
class DitaaBlock
class PlantUmlBlock

BlockProcessor <|-- DiagramBlock
DiagramBlock <|-- DitaaBlock
DiagramBlock <|-- PlantUmlBlock
....

The result been something like:

Generated with Asciidoctor Diagrams extension

One of the advantages of this tool is that it supports multiple diagram types. As we can see, we have used the PlantUML syntax but, there are many more available. Check the documentation.

Another of the advantages is that it is based on Asciidoctor that is a very well known tool and, in addition to the image it generates an HTML page with extra content if desired. Seems worth it for further exploration.

Structurizr

I was going to skip this one because, despite having a free option, requires some subscription for determinate features and, besides, it does not seem as easy to integrate and use as other tools we are seeing.

Despite all of this, I thought that it was worth it to mention it due to the demo page they offer where, with just some clicking, you can see the diagram expressed on different syntaxes such as PlantUML or WebSequenceDiagrams.

Diagrams

This is a tool that seems to have been implemented explicitly to follow the Diagram as Code practice focus on infrastructure. It allows you to write diagrams using Python and, in addition, to support and having nice images for the main cloud providers, it allows you to fetch non-available images to use in your diagrams.

Installation can be done using any of the available common mechanism in Python, in our case, pip3.

pip3 install diagrams

This is another one of the tools that, behind the scenes, uses Graphviz to do its job.

Let’s create our diagram now:

from diagrams import Cluster, Diagram
from diagrams.onprem.analytics import Spark
from diagrams.onprem.compute import Server
from diagrams.onprem.database import PostgreSQL
from diagrams.onprem.inmemory import Redis
from diagrams.onprem.aggregator import Fluentd
from diagrams.onprem.monitoring import Grafana, Prometheus
from diagrams.onprem.network import Nginx
from diagrams.onprem.queue import Kafka

with Diagram("Advanced Web Service with On-Premise", show=False):
    ingress = Nginx("ingress")

    metrics = Prometheus("metric")
    metrics << Grafana("monitoring")

    with Cluster("Service Cluster"):
        grpcsvc = [
            Server("grpc1"),
            Server("grpc2"),
            Server("grpc3")]

    with Cluster("Sessions HA"):
        master = Redis("session")
        master - Redis("replica") << metrics
        grpcsvc >> master

    with Cluster("Database HA"):
        master = PostgreSQL("users")
        master - PostgreSQL("slave") << metrics
        grpcsvc >> master

    aggregator = Fluentd("logging")
    aggregator >> Kafka("stream") >> Spark("analytics")

    ingress >> grpcsvc >> aggregator

And, let’s generate the diagram:

python3 diagrams-web-service.py

With that, the result is something like:

Diagram generated with Diagrams

As we can see, it is easy to understand and, the best part, it is quite eye-catching. And, everything looks in place without the need to mess with a drag and drop tool to position our elements.

Conclusion

As always, we need to evaluate which tool is the one that best fit our use case but, after seeing a few of them, my conclusions are:

  • If I need to generate infrastructure diagrams I will go with the Diagrams tools. Seems been easy to use been based on Python and, the results are very visually appealing.
  • For any other type of diagram, I will be inclined to use PlantUML. It seems to support a big deal of diagram types and, despite not being the most beautiful ones, it seems the results can be clear and useful enough.

Asciidoctor Diagrams seems a good option if your team or organisation is already using Asciidoctor and, it seems a good option if we want something else than just a diagram generated.

Diagrams as Code

Ubuntu Multipass

The ambit of IT, software development, operations or similar tends to be full of people that likes to try new trends or tools related directly with their day to day tasks or just out of curiosity. One quick way of doing this, it is to install all the tools and libraries in our machines and, after we have finished, try to clean everything or, at least, revert all the mistakes or, not very good practices we did when learning. Despite this been a valid way, overtime, our machines get polluted will lost dependencies, configuration files or libraries.

To avoid that, it seems a better way to try all the new stuff on an isolated environment and, if we like it and we decide do use it in our daily environments, to install it from scratch again probably correcting some initial mistakes or avoiding some bad practices.

There are plenty of solutions out there to achieve this and, to have an easy to set up throw-away environment. Most of them based on virtual machines or some kind of virtualisation. More traditional ones such as VirtualBox or VMWare or, some based on management solutions for virtual machines such as Vagrant.

Today, I just want to bring to the table a different one I have been playing with lately and, I did not know a few months ago. I do not know how popular is it or how extended it is but, I think that knowing different options it is always a plus. The tool is called Multipass. And, as Ubuntu describes it, it is “Ubuntu VMs on demand for any workstation. Multipass can launch and run virtual machines and configure them with cloud-init like a public cloud. Prototype your cloud launches locally for free.”

I have found it very easy to use and, for the purposes of having trow-away isolated environments laying around, quite useful.

We are going to see the install process and, the basic execution of a few commands related with an instance.

Installation

Before we start applying the steps to install Multipass on our machines, there are a couple of requirement we need to consider. They are related with the platform is going to be used to virtualise the images. In new operative systems, no extra requirements are needed but, some old ones have them. Check on the official documentation.

For Linux:

sudo snap install multipass

For Windows:

Just download the installer from the web page and proceed with the suggested steps.

For MacOS:

MacOS offers us two different alternatives. One based on an installation file similar to Windows and, one based on a package manager solution like Homebrew. If installing using the installation file, just execute it and follow the suggested steps and, if installing using Homebrew just execute the appropriate command (here):

brew install --cask multipass

Once the installation is done, any other command executed should be the same in all three environments.

Just as a side note, there is the possibility of using VirtualBox as a virtualisation platform if we desire it but, this is completely optional and depends only on our preferences. I am not using it. The command to install it can be found below but, I encourage you to go to the official documentation on this specific point.

Now we have finished the installation, let’s create our first instance.

Creating and using an instance

Let’s check what images are available:

Show the result of 'multipass find'
‘find’ execution – List of available images

We can see there are multiple images available but, in this case, we are going to create an instance using the latest version (20.10). By default, if not image is specified, multipass uses the las LTS version.

It is worth it to mention that, by default, multipass assign some values to our instance in terms of CPU, disk, memory and, others.

default instance values
default values – multipass launch -h
Show the result of 'multipass launch'
‘launch’ execution – Creates a new instance

As we can see it is quite fast and, if we create a second image, it will be even faster.

We can execute a command inside the instance:

Show the result of 'multipass exec'
‘exec’ execution – Executes a command inside the instance

Or, we can just login into the instance:

Show the result of 'multipass shell'
‘shell’ execution – Login into the instance

From now on, we can just work with this instance as we please.

There are a few commands we can use to manage our instances such as instances running, available instances or information about running instances. All these command are available on the help menu.

Show the result of 'multipass --help'
‘help’ execution – List available commands

Mounting a shared folder

Multipass offers the possibility of mounting folders to share information between the host and the instance using the command mount.

Show the share a folder process
Sharing a folder between host and instance

Cleaning (Deleting an instance)

Finally, as we do not want to leave throw-away instances laying around, after we have finished working, we can remove it.

Shows the 'multipass delete' execution
‘delete’ execution – Removes an instance

This is just a brief introduction to multipass. More complex scenarios can be found on the official documentation.

Ubuntu Multipass

Container attack vectors

We live in a containerised world. Container solutions like Docker are now so extended that they are not a niche thing any more or a buzzword, they are mainstream. Multiple companies use it and, the ones that do not are dreaming with it probably.

The only problems are that they are still something new. The adoption of them has been fast and, it has arrived like a storm to all kind of industries that use technology. The problem is that from a security point of view we, as an industry, do not have all the awareness we should have. Containers and, especially, containers running on cloud environments are hidden partially the fact that they exist and they need to be part of our security considerations. Some companies use them thinking they are completely secure, trusting the cloud providers or the companies that generate the containers take care of everything and, even, for less technology focus business, they are an abstraction and not real and tangible thing. They are not the old bare metal servers, the desktop machines or the virtual machines they were used to it, and till a certain point, they worried because they were things that could be touched.

All of that has made that while security concerns for web applications are first-level citizens, not as much as it should but the situation has improved a lot on the last few years, security concerns about containers seem to be the black sheep of the family, no one talks about it. And, this is not right. It should have the same level of concern and the same attention should be paid to it and, be part of the development life cycle.

In the same way that web applications can be attacked in multiple ways, containers have their own attack vectors, some of which we are going to see here. We will see that some of the attack vectors can be easily compared with known attack vectors on spaces we are more aware like web applications.

Vulnerable application code

Containers package applications and third-party dependencies that can contain known flaws or vulnerabilities. There are thousands of published vulnerabilities that attackers can take advantage to exploit our systems if found on the applications running inside the containers.

The best to try to avoid running container with known vulnerabilities is to scan the images we are going to deploy and, not just as a one-time thing. This should be part of our delivery pipelines and, the scans should apply all the time. In addition to known vulnerabilities, scanners should try to find out-of-date packages that need an update. Even, some available scanners try to find some possible malware on the images.

Badly configured container images

When configuring how a container is going to be built some vulnerabilities can be introduced by mistake or if not the proper attention is paid to the building process that can be later exploited by attackers. A very common example is to configure the container to run with unnecessary root permissions giving it more privileges on the host than it really needs.

Build machine attacks

As any piece of software, the one we use to run CI/CD pipelines and build container images can be attacked successfully and, attackers can add malicious code to our containers during the build phase obtaining access to our production environment once the containers have been deploy and, even, utilising these compromised containers to pivot to other parts of our systems or networks.

Supply chain attacks

Once containers have been built they are stored in registries and retrieved or “pulled” when they are going to be run. Unfortunately, no one can guarantee the security of this registries and, an attacker can compromise the registry an replace the original image with a modified one including a few surprises.

Badly configured containers

When creating configuration files for our containers, i.e. a YAML file, we can make some mistakes and add configurations to the containers we did not need. Some possible examples are unnecessary access privileges or unnecessary open ports.

Vulnerable host

Containers run on host machines and, in the same way, we try to ensure containers are secure host should be too. Some times they run old versions of orchestration component with known vulnerabilities or other components for monitorisation. A good idea is to minimise the number of components installed on the host, configure them correctly and apply security best practices.

Exposed secrets

Credentials, tokens or passwords are all of them necessary if we want our system to be able to communicate with other parts of the system. One risk is the way we supply the container and the applications running in it these secret values. There are different approaches with varying levels of security that can be used to prevent any leakage.

Insecure networking

The same than non containerised applications, containers need to communicate using networks. some level of attention will be necessary to set up secure connections among components.

Container escape vulnerabilities

Containers are prepared to run on isolation from the hosts were they are running, in general, all container runtimes like “containerd” or “CRI-O” have been heavily tested and are quite reliable but, as always, there are vulnerabilities to be discovered. Some of these vulnerabilities can let malicious code running inside a container escape out into the host. Due to the severity of this, some stronger isolation mechanisms can be worth to consider.

Some other risks related to containers but not directly been containers can be:

  • Attacks to code repositories of application deployed on the containers poisoning them with malicious code.
  • Hosts accessible from the Internet should be protected as expected with other tools like firewalls, identity and access management systems, secure network configurations and others.
  • When container run under an orchestrator, i.e. Kubernetes, a door to new attack vectors is open. Configurations, permission or access not controlled properly can give attackers access to our systems.

As we can see some of the attack vectors are similar to the one existing in more mature areas like networking or web application but, due to the abstraction and the easy-to-use approach, the security on containers, unfortunately, is left out the considerations.

Reference: “Container Security by Liz Rice (O’Reilly). Copyright 2020 Vertical Shift Ltd., 978-1-492-05670-6”

Container attack vectors

PostgreSQL: Advisory Locks

Today, we are going to talk about PostgreSQL Advisory Locks. This kind of locks are created by the application and developers and, they have meaning inside the application, PostgreSQL does not enforce their use and they are there to fulfil a business or coding specific case. I was going to try to explain and to add some literature around them but, after reading PostgreSQL documentation (can be found here) I do not think it is necessary because the definition it is easy to understand and, besides, on the same page we can find the other types of locks available giving us some extra context. Instead, we are going to see some real-world code as an example.

Let’s say we have our shiny service that runs multiple instances at the same time on our production environment and, on that services, we run a scheduled task that updates one of our database tables adding a different sequence number to the existing rows (buildings) for all the existing cities. Something like:

id (uuid)city (text)building. (text)registered (timestamp)occurrence (bigint)
e6448a82LondonBritish Museum2021/02/01 13:00:00.000null
97347903LondonTower of London2021/02/01 12:59:59.999null
7befe492ParisEiffel Tower2021/01/31 07:23:34.294null
b426681aParisLouvre Museum2021/02/01 12:59:59.999null
156e1f89LondonBig Ben2021/02/01 12:59:59.999null
Table ‘buildings’

For some curious minds about the reason why we need this ‘occurrence‘ sequence, one of the cases can be to create an endpoint to allow other systems to synchronise these buildings. We could sort using the ‘registered‘ field but, it can happen that two buildings in the same city can be registered at the same time making it impossible to warrantee the information is going to be returned always on the same order, and this can cause synchronisation problems or even missing a building due to paginated requests. We want to be able to sort them in an immutable way.

Going back to the multiple services running the tasks, we can have some ugly situations were one of the tasks running is working already and, in the middle of updating a city, when another task in a different service start processing the same city, especially if we do this on batches due to the huge amount of data we store.

One simple solution of this is to use Advisory Locks allowing us, developers, to lock a city when the task is updating it. For this purpose, PostgreSQL offers us two nice functions to work with:

  • pg_advisory_lock: Obtains an exclusive session-level advisory lock, waiting if necessary.
  • pg_try_advisory_lock: Obtains an exclusive session-level advisory lock if available. This will either obtain the lock immediately and return ‘true‘, or return ‘false‘ without waiting if the lock cannot be acquired immediately.

The full list of system administration functions related with advisory locks can be found here.

For the purposes of the example code, we are going to implement, we will be using the second one because it makes sense if one city it is been processed, we do not want to process it again till the next scheduled time.

public void assignOccurrenceSequences() {
    final List<String> cities = buildingDao.retrievePendingCities();

    for (final String city : cities) {
        final int lockId = Math.abs(Hashing.sha256().newHasher()
            .putString(city, StandardCharsets.UTF_8)
            .hash().asInt());

        logger.info("Taking advisory_lock {} for city {} ", lockId, city);
        try (Connection connection = dataSource.getConnection()) {
            connection.setAutoCommit(true);

            final boolean lockObtained;
            try (Statement statement = connection.createStatement()) {
                lockObtained = statement.execute(format("select pg_try_advisory_lock(%d)", lockId));
            }

            if (lockObtained) {
                try {
                    final int updates = buildingDao.populateOccurrenceSequences(city);
                    logger.info("Assigning {} sequences for city {}", updates, city);
                } finally {
                    try (Statement statement = connection.createStatement()) {
                        statement.execute(format("select pg_advisory_unlock(%d)", lockId));
                    }

                    logger.info("Released advisory_lock {} for city {}", lockId, city);
                }
            } else {
                logger.info("advisory_lock {} for city {} already taken", lockId, city);
            }
        } catch (SQLException e) {
            throw new IllegalStateException(e);
        }
    }
}

On lines 5, 6 and 7 we create a unique lock id we will be using to establish the lock and make sure all the tasks running calculate the same id. And yes, before someone points it, we are assuming that ‘city‘ is unique. With this generated lock id, we can try to acquire the lock. In case of success, we proceed with the update. In case of fail, we skip that city and proceed with the rest of the cities.

PostgreSQL: Advisory Locks

Fallacies of Distributed Computing

Distributed architecture styles, while much more powerful in terms of performance, scalability and availability than monolithic architecture styles, have significant trade-offs. One of the groups of issues is known as the fallacies of distributed computing.

A fallacy is something that is believed or assumed to be true but is not. All fallacies when analysed are common sense, even they are things we experiment every day but, for some reason, they are some times forgotten when designing new distributed systems.

Fallacy #1: The Network is Reliable

Ok, with the cloud environments we do not trip over network cables anymore but, while networks have become more reliable over time, the fact is that networks still remain generally unreliable, this being a factor that influences distributed system due to its reliance on the network for communication.

Fallacy #2: Latency is Zero

Let’s face it, the network sometimes goes faster and sometimes slower, is there someone out there it has never seen a streaming film suddenly freezing for a few seconds when the killer is behind the main character? Communication latency on a network is never zero and, it is always bigger than local latencies. When working with distributed systems we need to consider the latency average. And, not only that, we should consider the 95th to 99th percentile too because the values can differ.

Fallacy #3: Bandwidth is Infinite

We are sure about that, right? Explain that to your three flatmates when you are trying to call home and they are, independently, watching their favourite film on a streaming platform. The fact of the system been distributed increases the amount of information that travels through the network and every byte matters. Maybe, a simple request of 200 kilobytes seems small but multiply it for the number of requests made per second and, include all the request among services performed at the same time. This number can grow easily.

Fallacy #4: The Network is Secure

Just two words “cybercriminals everywhere” (no reason to be scared). The surface area for threats and attacks increases by magnitudes when moving from a monolithic to a distributed architecture. We know the need to secure all endpoints, even when communicating among internal services.

Fallacy #5: The Topology Never Changes

Raise your hand if you think IT members never change anything on your network over time. No hands. Good! Routers, hubs, switches, firewalls, networks and appliances used, even cloud networks can suffer changes or need updates and modifications that can affect services communications or latencies on the network.

Fallacy #6: There is Only One Administrator

Have you ask someone from IT to do something and asked later about the progress to a different person in the department to just need to explain your request again because the request was never logged? That happens, a lot, sometimes things get lost, the coordination is not good enough, the communication is not good enough or … you see where I am going.

Fallacy #7: Transport Cost is Zero

If you have an internet connection at home, probably your internet provider sends you a bill from time to time, if it does not, please stop using your neighbours’ Wi-Fi. It is not exactly the same but exemplifies that to be able to communicate certain infrastructure and network topology are necessary. The needs of monolithic applications are substantially different from the needs of distributed systems. servers, firewalls, multiple load balancers, proxies…

Fallacy #8: The Network is Homogeneous

I do not have a more mundane example for this one to make it easy to remember but a real one should be simple enough. A company using multiple cloud services at the same time. All of them are going to work well initially but, not all of them have been exactly built and tested in the same way. It can be differences in the services like latency or reliability, basically, everything named on the previous fallacies.

Reference: “Fundamentals of Software Architecture by Mark Richards and Neal Ford (O’Reilly). Copyright 2020 Mark Richards, Neal Ford, 978-1-492-04345-4″

Fallacies of Distributed Computing

AWS CDK Intro

During the last few years, we have been hearing about a lot of new practices applying to Software Development and DevOps. One of these topics is Infrastructure as Code. Probably, in this space, two of the most well-known solutions are Terraform and CloudFormation.

We have already discussed Terraform on this blog previously. If we take a look to the basic code on the example on this blog or you are already using it, probably, you are aware that when the infrastructure grows, the Terraform code, been quite verbose, grows fast too and, unless we have a very good code structure it can get very messy. The same can be said about CloudFomation. In addition, they are not as developer-friendly as common programming languages are and, they need to be learned as a new language.

To solve the first problem, code turning messing over time, there are some measures to split the Terraform code like creating modules but, for much of the projects I have been taken a look and articles I have read about it, it seems there is no agreement about the best way to split the code and, usually, if you do not work regularly with this kind of projects, it is very hard to find things and move around.

Trying to solve this problem and the second one, using languages well know by developers, there are projects like Pulumi trying to bring infrastructure code to use familiar programming languages and tools, including packaging and APIs.

Investigating a little bit about this, I have found another one, AWS Cloud Development Kit (AWS CDK), which is a software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. As said, initially AWS CDK is designed for CloudFormation but, there is an implementation to generate Terraform code.

Today’s purpose is just to play a little bit with this technology, just the one that generates CloudFormation code and not to compare them (I do not know enough about it to compare them…yet).

I was going to write a long post with examples but, I have found a great introduction workshop offered by AWS that it does the job nicely. For this reason, I am just leaving here my project on Github.

One of the nice things I have found about AWS CDK is that support multiple languages:

  • Typescript
  • Javascript
  • Python
  • Java
  • C#

And, after writing a few lines, it feels very comfortable to be writing code using Java APIs and using all the power of the multiple tools that exist on the Java ecosystem. Ond, of course, be using Java packages to sort the infrastructure code.

Just another alternative in our toolbox that seems worth it to explore.

AWS CDK Intro

Defining Software Architecture

The Software Architecture definition is something that, for a long time, the industry as a whole has not been able to agree or to find a consensual definition. In some cases, it is defined as the blueprint of a system and, in other, it is the roadmap for developing a system, including all the options in the middle.

The truth is that it is both things and, probably, much more than that. To try to figure out what it is, I think we are still far from a formal definition, we can focus on what it is analysed when we take a look at concrete architectures.

  • Structure
  • Architecture characteristics
  • Architecture decisions
  • Design principles

Structure

When we talk about the structure we are referring to the type or types of architecture styles selected to implement a system such as microservices, layered, or a microkernel. These styles do not describe and architecture but its structure.

Architecture characteristics

The architecture characteristics define the quality attributes of a system, the “-ilities” the system must support. These characteristics are not related to the business functionality of the system but with its proper function. They are sometimes known as non-functional requirements. Some of them are:

AvailabilityReliabilityTestability
ScalabilitySecurityAgility
Fault ToleranceElasticityRecoverability
PerformanceDeployabilityLearnability
Architecture characteristics

A long list of them, maybe too long, can be found on one of the articles on the Wikipedia: List of system quality attributes.

Architecture decisions

Architecture decisions define the rules of how a system should be built. Architecture decisions form the constraints of a system and inform the development teams of what it is allowed and what it is not when building the system.

An example, it is the decision of who should have access to the databases on the system, deciding that only business and service layers can access them and excluding the presentation layer.

When some of these decisions need to be broken due to constraints at one part of the system, this can be done using a variance.

Design principles

Design principles are guidelines rather than strong rules to follow. Things like synchronous versus asynchronous communications within a microservices architecture. It is some kind of a preferred way to do it but this does not mean developers cannot take different approaches on concrete situations.

Reference: “Fundamentals of Software Architecture by Mark Richards and Neal Ford (O’Reilly). Copyright 2020 Mark Richards, Neal Ford, 978-1-492-04345-4″

Defining Software Architecture

Cache: Spring Boot + Redis

Today, we are going to explore a little bit one of the cache options have available when working with Java projects. This option is Redis.

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker.

— Redis web page —

Let’s do it.

As a base project we are going to use a similar code to the one written for the previous articles: “Cache: Spring Boot + Ehcache” or “Cache: Spring Boot + Caffeine“.

An extra step we need to take here is the creation of a ‘docker-compose.yml‘ file to run Redis. We are going to be using the official image provided by Docker Hub. The content of our compose file will be:

version: '3'

services:
  redis:
    image: redis
    ports:
      - 6379:6379

Once we have Redis running and, our new endpoint ready to go, it is time to start configuring Redis.

First, we are going to create our configuration class. To activate the cache capabilities on Spring we can use the configuration and enable configuration annotations:

  • @Configuration
  • @EnableCaching

And, surprisingly, that’s all the Java configuration we need to write because Spring auto-configuration takes care of the rest. To allow this, we need to add our Redis properties to the ‘application.properties‘ file.

spring.cache.type=redis
spring.redis.host=localhost
spring.redis.port=6379

As simple as that, now, if we have the docker container running, when we start our application it will be able to talk to Redis.

Now on the service, we just need to add the appropriate annotation to indicate we want to use the cache.

@Cacheable(value = "md5-cache")
@Override
public String generateMd5(final String text) {
    log.info("Generating the MD5 hash...");

    try {
        final MessageDigest md = MessageDigest.getInstance("MD5");

        md.update(text.getBytes());

        return DatatypeConverter.printHexBinary(md.digest()).toUpperCase();
    } catch (NoSuchAlgorithmException e) {
        throw new RuntimeException("Unable to get MD5 instance");
    }
}

And, with this, everything should be in place to test it. We just need to run our application and invoke our endpoints, for example, using ‘curl’.

curl http://localhost:8080/api/hashes/hola

The result should be something like this:

2020-11-01 10:30:06.297 : Generating the MD5 hash...

As we can see, invoking multiple times the endpoint only created the first log line and, from this point, any invocation we’ll be taken from the cache.

Obviously, this is a pretty simple example but, this can help us to increase the performance of our system for more complex operations.

As usual, you can find the code here.

Cache: Spring Boot + Redis