Fallacies of Distributed Computing

Distributed architecture styles, while much more powerful in terms of performance, scalability and availability than monolithic architecture styles, have significant trade-offs. One of the groups of issues is known as the fallacies of distributed computing.

A fallacy is something that is believed or assumed to be true but is not. All fallacies when analysed are common sense, even they are things we experiment every day but, for some reason, they are some times forgotten when designing new distributed systems.

Fallacy #1: The Network is Reliable

Ok, with the cloud environments we do not trip over network cables anymore but, while networks have become more reliable over time, the fact is that networks still remain generally unreliable, this being a factor that influences distributed system due to its reliance on the network for communication.

Fallacy #2: Latency is Zero

Let’s face it, the network sometimes goes faster and sometimes slower, is there someone out there it has never seen a streaming film suddenly freezing for a few seconds when the killer is behind the main character? Communication latency on a network is never zero and, it is always bigger than local latencies. When working with distributed systems we need to consider the latency average. And, not only that, we should consider the 95th to 99th percentile too because the values can differ.

Fallacy #3: Bandwidth is Infinite

We are sure about that, right? Explain that to your three flatmates when you are trying to call home and they are, independently, watching their favourite film on a streaming platform. The fact of the system been distributed increases the amount of information that travels through the network and every byte matters. Maybe, a simple request of 200 kilobytes seems small but multiply it for the number of requests made per second and, include all the request among services performed at the same time. This number can grow easily.

Fallacy #4: The Network is Secure

Just two words “cybercriminals everywhere” (no reason to be scared). The surface area for threats and attacks increases by magnitudes when moving from a monolithic to a distributed architecture. We know the need to secure all endpoints, even when communicating among internal services.

Fallacy #5: The Topology Never Changes

Raise your hand if you think IT members never change anything on your network over time. No hands. Good! Routers, hubs, switches, firewalls, networks and appliances used, even cloud networks can suffer changes or need updates and modifications that can affect services communications or latencies on the network.

Fallacy #6: There is Only One Administrator

Have you ask someone from IT to do something and asked later about the progress to a different person in the department to just need to explain your request again because the request was never logged? That happens, a lot, sometimes things get lost, the coordination is not good enough, the communication is not good enough or … you see where I am going.

Fallacy #7: Transport Cost is Zero

If you have an internet connection at home, probably your internet provider sends you a bill from time to time, if it does not, please stop using your neighbours’ Wi-Fi. It is not exactly the same but exemplifies that to be able to communicate certain infrastructure and network topology are necessary. The needs of monolithic applications are substantially different from the needs of distributed systems. servers, firewalls, multiple load balancers, proxies…

Fallacy #8: The Network is Homogeneous

I do not have a more mundane example for this one to make it easy to remember but a real one should be simple enough. A company using multiple cloud services at the same time. All of them are going to work well initially but, not all of them have been exactly built and tested in the same way. It can be differences in the services like latency or reliability, basically, everything named on the previous fallacies.

Reference: “Fundamentals of Software Architecture by Mark Richards and Neal Ford (O’Reilly). Copyright 2020 Mark Richards, Neal Ford, 978-1-492-04345-4″

Fallacies of Distributed Computing

AWS CDK Intro

During the last few years, we have been hearing about a lot of new practices applying to Software Development and DevOps. One of these topics is Infrastructure as Code. Probably, in this space, two of the most well-known solutions are Terraform and CloudFormation.

We have already discussed Terraform on this blog previously. If we take a look to the basic code on the example on this blog or you are already using it, probably, you are aware that when the infrastructure grows, the Terraform code, been quite verbose, grows fast too and, unless we have a very good code structure it can get very messy. The same can be said about CloudFomation. In addition, they are not as developer-friendly as common programming languages are and, they need to be learned as a new language.

To solve the first problem, code turning messing over time, there are some measures to split the Terraform code like creating modules but, for much of the projects I have been taken a look and articles I have read about it, it seems there is no agreement about the best way to split the code and, usually, if you do not work regularly with this kind of projects, it is very hard to find things and move around.

Trying to solve this problem and the second one, using languages well know by developers, there are projects like Pulumi trying to bring infrastructure code to use familiar programming languages and tools, including packaging and APIs.

Investigating a little bit about this, I have found another one, AWS Cloud Development Kit (AWS CDK), which is a software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. As said, initially AWS CDK is designed for CloudFormation but, there is an implementation to generate Terraform code.

Today’s purpose is just to play a little bit with this technology, just the one that generates CloudFormation code and not to compare them (I do not know enough about it to compare them…yet).

I was going to write a long post with examples but, I have found a great introduction workshop offered by AWS that it does the job nicely. For this reason, I am just leaving here my project on Github.

One of the nice things I have found about AWS CDK is that support multiple languages:

  • Typescript
  • Javascript
  • Python
  • Java
  • C#

And, after writing a few lines, it feels very comfortable to be writing code using Java APIs and using all the power of the multiple tools that exist on the Java ecosystem. Ond, of course, be using Java packages to sort the infrastructure code.

Just another alternative in our toolbox that seems worth it to explore.

AWS CDK Intro

Defining Software Architecture

The Software Architecture definition is something that, for a long time, the industry as a whole has not been able to agree or to find a consensual definition. In some cases, it is defined as the blueprint of a system and, in other, it is the roadmap for developing a system, including all the options in the middle.

The truth is that it is both things and, probably, much more than that. To try to figure out what it is, I think we are still far from a formal definition, we can focus on what it is analysed when we take a look at concrete architectures.

  • Structure
  • Architecture characteristics
  • Architecture decisions
  • Design principles

Structure

When we talk about the structure we are referring to the type or types of architecture styles selected to implement a system such as microservices, layered, or a microkernel. These styles do not describe and architecture but its structure.

Architecture characteristics

The architecture characteristics define the quality attributes of a system, the “-ilities” the system must support. These characteristics are not related to the business functionality of the system but with its proper function. They are sometimes known as non-functional requirements. Some of them are:

AvailabilityReliabilityTestability
ScalabilitySecurityAgility
Fault ToleranceElasticityRecoverability
PerformanceDeployabilityLearnability
Architecture characteristics

A long list of them, maybe too long, can be found on one of the articles on the Wikipedia: List of system quality attributes.

Architecture decisions

Architecture decisions define the rules of how a system should be built. Architecture decisions form the constraints of a system and inform the development teams of what it is allowed and what it is not when building the system.

An example, it is the decision of who should have access to the databases on the system, deciding that only business and service layers can access them and excluding the presentation layer.

When some of these decisions need to be broken due to constraints at one part of the system, this can be done using a variance.

Design principles

Design principles are guidelines rather than strong rules to follow. Things like synchronous versus asynchronous communications within a microservices architecture. It is some kind of a preferred way to do it but this does not mean developers cannot take different approaches on concrete situations.

Reference: “Fundamentals of Software Architecture by Mark Richards and Neal Ford (O’Reilly). Copyright 2020 Mark Richards, Neal Ford, 978-1-492-04345-4″

Defining Software Architecture

Cache: Spring Boot + Redis

Today, we are going to explore a little bit one of the cache options have available when working with Java projects. This option is Redis.

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker.

— Redis web page —

Let’s do it.

As a base project we are going to use a similar code to the one written for the previous articles: “Cache: Spring Boot + Ehcache” or “Cache: Spring Boot + Caffeine“.

An extra step we need to take here is the creation of a ‘docker-compose.yml‘ file to run Redis. We are going to be using the official image provided by Docker Hub. The content of our compose file will be:

version: '3'

services:
  redis:
    image: redis
    ports:
      - 6379:6379

Once we have Redis running and, our new endpoint ready to go, it is time to start configuring Redis.

First, we are going to create our configuration class. To activate the cache capabilities on Spring we can use the configuration and enable configuration annotations:

  • @Configuration
  • @EnableCaching

And, surprisingly, that’s all the Java configuration we need to write because Spring auto-configuration takes care of the rest. To allow this, we need to add our Redis properties to the ‘application.properties‘ file.

spring.cache.type=redis
spring.redis.host=localhost
spring.redis.port=6379

As simple as that, now, if we have the docker container running, when we start our application it will be able to talk to Redis.

Now on the service, we just need to add the appropriate annotation to indicate we want to use the cache.

@Cacheable(value = "md5-cache")
@Override
public String generateMd5(final String text) {
    log.info("Generating the MD5 hash...");

    try {
        final MessageDigest md = MessageDigest.getInstance("MD5");

        md.update(text.getBytes());

        return DatatypeConverter.printHexBinary(md.digest()).toUpperCase();
    } catch (NoSuchAlgorithmException e) {
        throw new RuntimeException("Unable to get MD5 instance");
    }
}

And, with this, everything should be in place to test it. We just need to run our application and invoke our endpoints, for example, using ‘curl’.

curl http://localhost:8080/api/hashes/hola

The result should be something like this:

2020-11-01 10:30:06.297 : Generating the MD5 hash...

As we can see, invoking multiple times the endpoint only created the first log line and, from this point, any invocation we’ll be taken from the cache.

Obviously, this is a pretty simple example but, this can help us to increase the performance of our system for more complex operations.

As usual, you can find the code here.

Cache: Spring Boot + Redis

Cache: Spring Boot + Caffeine

Today, we are going to explore a little bit one of the cache options have available when working with Java projects. This option is Caffeine.

Caffeine is a high performance, near optimal caching library based on Java 8. For more details, see our user’s guide and browse the API docs for the latest release.

— Caffeine wiki —

Let’s do it.

As a base project we are going to use a similar code to the one written for the previous article “Cache: Spring Boot + Ehcache“.

The only thing we are going to change is we are going to duplicate the existing endpoint to allow be able to try two different ways of working with Caffeine.

Once we have our new endpoint ready to go, it is time to start configuring Caffeine. We are going to take two different approaches:

  1. Make use of Spring injection capabilities.
  2. More manual approach.

Leveraging Spring injection capabilities

First, we are going to create our configuration class. To activate the cache capabilities on Spring we can use the configuration and enable configuration annotations:

  • @Configuration
  • @EnableCaching

With this, we can add now the beans to create our cache and configure appropriately Caffeine.

@Bean
@SuppressWarnings("all")
public Caffeine caffeineConfig() {
    return Caffeine.newBuilder()
        .maximumSize(50)
        .expireAfterWrite(10, TimeUnit.SECONDS)
        .removalListener(CacheEventLogger.removalListener());
}

@Bean
@SuppressWarnings("all")
public CacheManager cacheManager(final Caffeine caffeine) {
    final CaffeineCacheManager caffeineCacheManager = new CaffeineCacheManager();

    caffeineCacheManager.setCaffeine(caffeine);

    return caffeineCacheManager;
}

As you can see, something pretty simple. I have tried to mimic the configuration set for the Ehcache example on the previous article. If you have not done it, you can check it now. A summary of this configuration is:

  • Cache size: 50 entries.
  • And the expiration policy: Expiration after the write of 10 seconds.

Now on the service, we just need to add the appropriate annotation to indicate we want to use the cache.

@Cacheable(cacheNames = MD5_CACHE_ID)
@Override
public String generateMd5SpringCache(final String text) {
    log.info("The value was not cached by Spring");
    return generateMd5(text);
}

That simple.

Manual approach

We have the possibility of creating the cache manually. This can be desired for multiple reasons like: not having Spring available, wanting component isolation, not dealing with the cache manages an multiple caches or, whatever reason, us, as a developers decide that it fits the best our use case.

To do this manual configuration we just need to exclude the configuration class and the beans’ creation, create the cache in our service class and invoke it when a request arrives.

final LoadingCache<String, String> md5Cache = Caffeine.newBuilder()
    .maximumSize(50)
    .expireAfterWrite(10, TimeUnit.SECONDS)
    .removalListener(CacheEventLogger.removalListener())
    .build(this::generateMd5Wrapper);

@Override
public String generateMd5ManualCache(final String text) {
    return md5Cache.get(text);
}

Nothing too fancy. It is worth it a note about the method ‘generateMd5Wrapper‘. It is completely unnecessary, the only reason it has been created is to be able to write an extra log line to run the demo and to have visible effects of the cache working.

The last thing we have defined is a removal listener to log when an object is removed from the cache. Again, this is just for demo purposes and, it is not necessary.

public static RemovalListener<String, String> removalListener() {
    return (String key, String graph, RemovalCause cause) ->
        log.info("Key {} was removed ({})", key, cause);
}

And, with this, everything should be in place to test it. We just need to run our application and invoke our endpoints, for example, using ‘curl’.

curl http://localhost:8080/api/hashes/spring/hola
curl http://localhost:8080/api/hashes/manual/hola

The result should be something like this:

2020-10-31 08:15:19.610 : The value was not cached by Spring
2020-10-31 08:15:35.316 : The value was not cached by Spring
2020-10-31 08:15:35.317 : Key hola was removed (EXPIRED)
2020-10-31 08:15:39.717 : The value was not cached manually
2020-10-31 08:15:55.443 : The value was not cached manually
2020-10-31 08:15:55.443 : Key hola was removed (EXPIRED)

As we can see, invoking multiple times the endpoint only created the first log line and, it is just after waiting for some time (more than 10 seconds) when the cache entry gets expired and re-created.

Obviously, this is a pretty simple example but, this can help us to increase the performance of our system for more complex operations.

As usual, you can find the code here.

Cache: Spring Boot + Caffeine

Cache: Spring Boot + Ehcache

Today, we are going to explore a little bit one of the cache options have available when working with Java projects. This option is Ehcache.

Ehcache is an open-source, standards-based cache that boosts performance, offloads your database, and simplifies scalability. It’s the most widely-used Java-based cache because it’s robust, proven, full-featured, and integrates with other popular libraries and frameworks. Ehcache scales from in-process caching, all the way to mixed in-process/out-of-process deployments with terabyte-sized caches.

— Ehcache web page —

In our case, we are going to use Ehcache version 3 as this provides an implementation of a JSR-107 cache manager and Spring Boot to create a simple endpoint that is going to return the MD5 hash of a given text.

Let’s do it.

We are going to be starting a maven project and adding various dependencies:

DependencyVersionComment
spring-boot-starter-parent2.3.4.RELEASEParent of our project
spring-boot-starter-webManaged by the parent
spring-boot-starter-actuatorManaged by the parent
spring-boot-starter-cacheManaged by the parent
lombokManaged by the parent
javax.cache:cache-api1.1.1
org.ehcache:ehcache3.8.1
Project dependencies

Now, let’s create the endpoint and the service we are going to cache. Assuming you, the reader, have some knowledge of spring, I am not going to go into details on this.

// Controller code
@RestController
@RequestMapping(value = "/api/hashes")
@AllArgsConstructor
public class HashController {

    private final HashService hashService;

    @GetMapping(value = "/{text}", produces = APPLICATION_JSON_VALUE)
    public HttpEntity<String> generate(@PathVariable final String text) {
        return ResponseEntity.ok(hashService.generateMd5(text));
    }
}

// Service code
@Service
public class HashServiceImpl implements HashService {

    @Override
    public String generateMd5(final String text) {
        try {
            final MessageDigest md = MessageDigest.getInstance("MD5");

            md.update(text.getBytes());

            return DatatypeConverter.printHexBinary(md.digest()).toUpperCase();
        } catch (NoSuchAlgorithmException e) {
            throw new RuntimeException("Unable to get MD5 instance");
        }
    }
}

Simple stuff. Let’s know add the cache capabilities.

First, we will add the cache configuration to our service as a new annotation.

@Cacheable(value = "md5-cache")

This is going to define the name that it is going to be used for this cache ‘md5-cache’. As a key, the content of the method parameter will be used.

The next step is to add the configuration. To activate the cache capabilities on Spring we can use the configuration and enable configuration annotations:

  • @Configuration
  • @EnableCaching

Even with this, and using the Spring Boot auto-configuration, no caches are created by default and we need to create them. There are two ways this can be done:

  • Using and XML file with the configuration.
  • Programmatically.

If you are a follower of this blog or you have read some of the existing posts, probably, you have realised I am not a big fan of the XML configuration and I prefer to do things programmatically and, this is what we are going to do. In any case, I will try to add the XML equivalent to the configuration but, it has not been tested.

The full configuration is:

@Bean
CacheManager getCacheManager() {
    final CachingProvider provider = Caching.getCachingProvider();
    final CacheManager cacheManager = provider.getCacheManager();

    final CacheConfigurationBuilder<String, String> configurationBuilder =
        CacheConfigurationBuilder.newCacheConfigurationBuilder(
            String.class, String.class,
            ResourcePoolsBuilder.heap(50)
                .offheap(10, MemoryUnit.MB)) .withExpiry(ExpiryPolicyBuilder.timeToIdleExpiration(Duration.ofSeconds(10)));

    final CacheEventListenerConfigurationBuilder asynchronousListener = CacheEventListenerConfigurationBuilder
        .newEventListenerConfiguration(new CacheEventLogger(), EventType.CREATED, EventType.EXPIRED)
        .unordered().asynchronous();

    cacheManager.createCache("md5-cache",
        Eh107Configuration.fromEhcacheCacheConfiguration(configurationBuilder.withService(asynchronousListener)));

    return cacheManager;
}

But, let’s explain it in more details.

final CacheConfigurationBuilder<String, String> configurationBuilder =
        CacheConfigurationBuilder.newCacheConfigurationBuilder(
                String.class, String.class,
                ResourcePoolsBuilder.heap(50)
                    .offheap(10, MemoryUnit.MB))
.withExpiry(ExpiryPolicyBuilder.timeToIdleExpiration(Duration.ofSeconds(10)));

Here we can see of the cache characteristics:

  • The type of data: String for both, key and value.
  • Cache size: Heap = 50 entries and size 10MB (obviously absurd numbers but good enough to exemplify).
  • And the expiration policy: ‘Time to Idle’ of 10 seconds. It can be defined as ‘Time to Live’.

The next thing we are creating is a cache listener to log the operations:

final CacheEventListenerConfigurationBuilder asynchronousListener = CacheEventListenerConfigurationBuilder
            .newEventListenerConfiguration(new CacheEventLogger(), EventType.CREATED, EventType.EXPIRED)
            .unordered().asynchronous();

Basically, we are going log a message when the cache creates or expired and entry. Other events can be added.

And, finally, we create the cache:

cacheManager.createCache("md5-cache",
Eh107Configuration.fromEhcacheCacheConfiguration(configurationBuilder.withService(asynchronousListener)));

With the name matching the one we have used on the service annotation.

The XML configuration should be something like:

<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.ehcache.org/v3"
    xmlns:jsr107="http://www.ehcache.org/v3/jsr107"
    xsi:schemaLocation="
        http://www.ehcache.org/v3 http://www.ehcache.org/schema/ehcache-core-3.0.xsd
        http://www.ehcache.org/v3/jsr107 http://www.ehcache.org/schema/ehcache-107-ext-3.0.xsd">

<cache alias="md5-cache">
    <key-type>java.lang.String</key-type>
    <value-type>java.lang.String</value-type>
    <expiry>
        <tti unit="seconds">10</ttl>
    </expiry>

    <listeners>
        <listener>
<class>dev.binarycoders.ehcache.utils.CacheEventLogger</class>
            <event-firing-mode>ASYNCHRONOUS</event-firing-mode>
            <event-ordering-mode>UNORDERED</event-ordering-mode>
            <events-to-fire-on>CREATED</events-to-fire-on>
            <events-to-fire-on>EXPIRED</events-to-fire-on>
        </listener>
    </listeners>

    <resources>
        <heap unit="entries">50</heap>
        <offheap unit="MB">10</offheap>
    </resources>
</cache>

Just remember we need to add a property to our ‘application.properties’ file if we choose the XML approach.

spring.cache.jcache.config=classpath:ehcache.xml

And, with this, everything should be in place to test it. We just need to run our application and invoke our endpoint, for example, using ‘curl’.

curl http://localhost:8080/api/hashes/hola

The result should be something like this:

2020-10-25 11:29:22.364 : Type: CREATED, Key: hola, Old: null, New: D41D8CD98F00B204E9800998ECF8427E
2020-10-25 11:29:42.707 : Type: EXPIRED, Key: hola, Old: D41D8CD98F00B204E9800998ECF8427E, New: null
2020-10-25 11:29:42.707 : Type: CREATED, Key: hola, Old: null, New: D41D8CD98F00B204E9800998ECF8427E

As we can see, invoking multiple times the endpoint only created the first log line and, it is just after waiting for some time (more than 10 seconds) when the cache entry gets expired and re-created.

Obviously, this is a pretty simple example but, this can help us to increase the performance of our system for more complex operations.

As usual, you can find the code here.

Cache: Spring Boot + Ehcache

Add a header to Spring RestTemplate

Today, just a short code snippet. How to add a header to the ‘RestTemplate’ on Spring.

public class HeaderRequestInterceptor implements ClientHttpRequestInterceptor {

    private final String headerName;
    private final String headerValue;

    public HeaderRequestInterceptor(String headerName, String headerValue) {
        this.headerName = headerName;
        this.headerValue = headerValue;
    }

    @Override
    public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution) throws IOException {
        request.getHeaders().set(headerName, headerValue);
        return execution.execute(request, body);
    }
}

Now, we add it to our ‘RestTemplate’:

List<ClientHttpRequestInterceptor> interceptors = new ArrayList<ClientHttpRequestInterceptor>();
interceptors.add(new HeaderRequestInterceptor("X-Custom-Header", "<custom_value>"));

RestTemplate restTemplate = new RestTemplate();
restTemplate.setInterceptors(interceptors);

And, that’s all.

Just an extra side note. As of Spring Framework 5, a new HTTP client called ‘WebClient’ has been added. It is assumed that ‘RestTemplate’ will be deprecated at some point. If we are starting a new application, specially if you are using the ‘WebFlux’ stack, it will be a better choice to use the new version.

Add a header to Spring RestTemplate

Git branch on terminal prompt

There are a lot of GUI tools to interact with git these days but, a lot of us, we are still using the terminal for that. One thing I have found very useful is, once you are in a folder with a git repository, to be able to see the branch in use on the terminal prompt. We can achieve this with a few steps.

Step 1

Let’s check the actual definition of our prompt. Mine looks something like:

echo $PS1
\[\e]0;\u@\h: \w\a\]\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]$

Step 2

Open with our favourite the file ‘~/.bashrc. In my case, ‘Vim’.

Step 3

Let’s create a small function to figure out the branch we are once we are on a folder with a git repository.

git_branch() {
  git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/(\1)/'
}

Step 4

Let’s redefine the variable ‘PS1’ to include the function we have just defined. In addition, let’s add some colour to the branch name to be able to see it easily. Taking my initial values and adding the function call the result should be something like:

export PS1="\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] \[\033[00;32m\]\$(git_branch)\[\033[00m\]\$ "

And, it should look like:

Git branch on terminal prompt