Maintaining compatibility

Most of the companies nowadays are implementing or want to implement architectures based on micro-services. While this can help companies to overcome multiple challenges, it can bring its own new challenges to the table.

In this article, we are going to discuss a very concrete one been maintaining compatibility when we change objects that help us to communicate the different micro-services on our systems. Sometimes, they are called API objects, Data Transfer Objects (DTO) or similar. And, more concretely, we are going to be using Jackson and JSON as a serialisation (marshalling and unmarshalling) mechanism.

There are some other methods, other technologies and other ways to achieve this but, this is just one tool to keep on our belt and be aware of to make informed decisions when faced with this challenge in the future.

Maintaining compatibility is a very broad term, to establish what we are talking about and ensure we are on the same page, let us see a few examples of real situations we are trying to mitigate:

  • To deploy breaking changes when releasing new features or services due to changes on the objects used to communicate the different services. Especially, if at deployment time, there is a small period of time where the old and the new versions are still running (almost impossible to avoid unless you stop both services, deploy them and restart them again).
  • To be forced to have a strict order of deployment for our different services. We should be able to deploy in any order and whenever it best suits the business and the different teams involved.
  • The need of, due to a change in one object in one concrete service, being forced to deploy multiple services not directly involved or affected by the change.
  • Related to the previous point, to be forced to change other services because of a small data or structural change. An example of this would be some objects that travel through different systems been, for example, enriched with extra information and finally shown to the user on the last one.

To exemplify the kind of situation we can find ourselves in, let us take a look at the image below. In this scenario, we have four different services:

  • Service A: It stores some basic user information such as the first name and last name of a user.
  • Service B: It enriches the user information with extra information about the job position of the user.
  • Service C: It adds some extra administrative information such as the number of complaints open against the user.
  • Service D: It finally uses all the information about the user to, for example, calculate some advice based on performance and area of work.

All of this is deployed and working on our production environment using the first version of our User object.

At some point, product managers decided the age field should be considered on the calculations to be able to offer users extra advice based on proximity of retirement. This added requirement is going to create a second version of our User object where the field age is present.

Just a last comment, for simplicity purposes, let us say the communication between services is asynchronous based on queues.

As we can see on the image, in this situation only services A and D should be modified and deployed. This is what we are trying to achieve and what I mean by maintaining compatibility. But, first, let us explore what are the options we have at this point:

  1. Upgrade all services to the second version of the object User before we start sending messages.
  2. Avoid sending the User from service A to service D, send just an id, and perform a call from service D to recover the User information based on the id.
  3. Keep the unknown fields on an object even, if the service processing the message at this point does not know anything about them.
  4. Fail the message, and store it for re-processing until we perform the upgrade to all services involved. This option is not valid on synchronous communications.

Option 1

As we have described, it implies the update of the dependency service-a-user in all the projects. This is possible but it brings quickly some problems to the table:

  • We not only need to update direct dependencies but indirect dependencies too what it can be hard to track, and easy to miss. In addition, a decision needs to be done about what to do when a dependency is missed, should an error be thrown? Should we fail silently?
  • We have a problem with scenarios where we need to roll back a deployment due to something going wrong. Should we roll back everything? Good luck! Should we try to fix the problem while our system is not behaving properly?
  • Heavy refactoring operations or modifications can make upgrades very hard to perform.

Option 2

Instead of sending the object information on the message, we just send an id to be able posteriorly to recover the object information using a REST call. This option while very useful in multiple cases is not exempt from problems:

  • What if, instead of just a couple of enrichers, we have a dozen of them and they need the user information? Should we consolidate all services and send ids for the enriched information crating stores on the enrichers?
  • If, instead of a queue, other mechanisms of communications are used such as RPC, do now all the services need to call service A to recover the User information and do their job? This just creates a cascade of calls.
  • And, under this scenario, we can have inconsistent data if there is any update while the different services are recovering a User.

Option 3

This is going to be the desired option and the one we are going to do a deep dive on this article using Jackson and JSON how to keep the fields even if the processing service does not know everything about them.

To add in advance that, as always, there are no silver bullets, there are problems that not even this solution can solve but it will mitigate most of the ones we have named on previous lines.

One problem we are not able to solve with this approach – especially if your company perform “all at once” releases instead of independent ones – is, if service B, once deployed tries to persist some information on service A before the new version has been deployed, or tries to perform a search using one criterion, in this case, the field age, on the service A. In this scenario, the only thing we can do is to throw an error.

Option 4

This option, especially in asynchronous situations where messages can be stored to be retried later, can be a possible solution to propagate the upgrade. It will slow down our processing capabilities temporarily, and retrying mechanism needs to be in place but, it is doable.

Using Jackson to solve versioning

Renaming a field

Plain and simple, do not do it. Especially, if it is a client-facing API and not an internal one. It will save you a lot of trouble and headaches. Unfortunately, if we are persisting JSON on our databases, this will require some migrations.

If it needs to be done, think about it again. Really, rethink it. If after rethinking it, it needs to be done a few steps need to be taken:

  1. Update the API object with the new field name using @JsonAlias.
  2. Release and update everything using the renamed field, and @JsonAlias for the old field name.
  3. Remove @JsonAlias for the old field name. This is a cleanup step, everything should work after step two.

Removing a field

Like in the previous case, do not do it, or think very hard about it before you do it. Again, if you finally must, a few steps need to be followed.

First, consider to deprecate the field:

If it must be removed:

  1. Explicitly ignore the old property with @JsonIgnoreProperties.
  2. Remove @JsonIgnoreProperties for the old field name.

Unknown fields (adding a field)

Ignoring them

The first option is the simplest one, we do not care for new fields, a rare situation but it can happen. We should just ignore them:

A note of caution in this scenario is that we need tone completely sure we want to ignore all properties. As an example, we can miss on APIs that return errors as HTTP 200 OK, and map the errors on the response if we are not aware of that, while in other circumstances it will just crash making us aware.

Ignoring enums

In a similar way, we can ignore fields, we can ignore enums, or more appropriately, we can map them to an UNKNOWN value.

Keeping them

The most common situation is that we want to keep the fields even if they do not mean anything for the service it is currently processing the object because they will be needed up or downs the stream.

Jackson offers us two interesting annotations:

  • @JsonAnySetter
  • @JsonAnyGetter

These two annotations help us to read and write fields even if I do not know what they are.

class User {
    @JsonAnySetter
    private final Map<String, Object> unknownFields = new LinkedHashMap<>();
    
    private Long id;
    private String firstname;
    private String lastname;

    @JsonAnyGetter
    public Map<String, Object> getUnknownFields() {
        return unknownFields;
    }
}

Keeping enums

In a similar way, we are keeping the fields, we can keep the enums. The best way to achieve that is to map them as strings but leave the getters and setters as the enums.

@JsonAutoDetect(
    fieldVisibility = Visibility.ANY,
    getterVisibility = Visibility.NONE,
    setterVisibility = Visibility.NONE)
class Process {
    private Long id;
    private String state;

    public void setState(State state) {
        this.state = nameOrNull(state);
    }

    public State getState() {
        return nameOrDefault(State.class, state, State.UNKNOWN);
    }

    public String getStateRaw() {
        return state;
    }
}

enum State {
    READY,
    IN_PROGRESS,
    COMPLETED,
    UNKNOWN
}

Worth pointing that the annotation @JsonAutoDetect tells Jackson to ignore the getters and setter and perform the serialisation based on the properties defined.

Unknown types

One of the things Jackson can manage is polymorphism but this implies we need to deal sometimes with unknown types. We have a few options for this:

Error when unknown type

We prepare Jackson to read an deal with known types but it will throw an error when an unknown type is given, been this the default behaviour:

@JsonTypeInfo(
    use = JsonTypeInfo.Id.NAME,
    include = JsonTypeInfo.As.PROPERTY)
@JsonSubTypes({
    @JsonSubTypes.Type(value = SelectionProcess.class, name = "SELECTION_PROCESS"),
})
interface Process {
}

Keeping the new type

In a very similar to what we have done for fields, Jackson allow as to define a default or fallback type when the given type is not found, what put together with out unknown fields previous implementation can solve our problem.

@JsonTypeInfo(
    use = JsonTypeInfo.Id.NAME,
    include = JsonTypeInfo.As.PROPERTY,
    property = "@type",
    defaultImpl = AnyProcess.class)
@JsonSubTypes({
    @JsonSubTypes.Type(value = SelectionProcess.class, name = "SELECTION_PROCESS"),
    @JsonSubTypes.Type(value = SelectionProcess.class, name = "VALIDATION_PROCESS"),
})
interface Process {
    String getType();
}

class AnyProcess implements Process {
    @JsonAnysetter
    private final Map<String, Object> unknownFields = new LinkedHashMap<>();

    @JsonProperty("@type")
    private String type;

    @Override
    public String getType() {
        return type;
    }

    @JsonAnyGetter
    public Map<String, Object> getUnknownFields() {
        return unknownFields
    }
}

And, with all of this, we have decent compatibility implemented, all provided by the Jackson serialisation.

We can go one step further and implement some basic classes with the common code e.g., unknownFields, and make our API objects extend for simplicity, to avoid boilerplate code and use some good practices. Something similar to:

class Compatibility {
    ....
}

class MyApiObject extends Compatibility {
    ...
}

With this, we have a new tool under our belt we can consider and use whenever is necessary.

Maintaining compatibility

Microservices: Capability model

Microservices is an area that is still evolving. There is no standard or reference architecture for microservices. Some of the architectures available publicly nowadays are from vendors and, obviously, they try to promote their own tools stack.

But, even, do not having an specific standard or reference we can sketch out some guidelines to design and develop microservices.

Capability Model
Image 1. Capability Model (Seen in “Spring 5.0 Microservices – Second Edition)

As we can see, the capability model is main splitted in four different areas:

  • Core capabilities (per microservice).
  • Supporting capabilities.
  • Process and governance capabilities.
  • Infrastructure capabilities.

Core capabilities

Core capabilities are those components generally packaged inside a single microservice. In case of microservices and fat jars approach, everything will be inside the file we are generating.

Service listeners and libraries

This box is referring to the listener and libraries the microservice has in place to accept service requests. The can be HTTP listeners, message listeners or more. There is one exception though, if the microservice is in char only of scheduled tasks, maybe, it does not need listeners.

Storage capability

Microservices can have some king of storage to do properly their task, physical storages like MySQ, MongoDB or Elasticsearch, or in-memory storages, caches or in-memory data grids like Ehcache, Hazelcast or others. There are some different storages but, it does not matter what type of storage is used, this will be owned by the microservice.

Service implementation

This is were the business logic is implemented. It should follow tradicional design approaches like modularization and multi-layered. Different microservices can be implemented in different languages and, as a recommendation, they should be as stateless as possible.

Service endpoint

This box just refers to the external APIs offered by the microservice. Both included, asynchronous and synchronous, been possible to use technologies from REST/JSON to messaging.

Infrastructure capabilities

To deploy our application and for the application to work properly we need some infrastructure and infrastructure management capabilities.

Cloud

For obvious reason, microservice architectures fit more in cloud based environments that in tradicional data center environments. Things like scaling, cost effective management and the cost of the physical infrastructures and the operations make in multiple occasion a cloud solution more cost effective.

We can find different providers like AWS, Azure or IBM Bluemix.

Container runtime

There are multiple options here and, obviously, container solutions are not the only solutions. There are option like virtual machines but, from a resources point of view, the last ones consume more of them. In addition, usually it is much faster to start an instance of a new container than to start a new virtual machine.

Here, we can find technologies like Docker, Rocket and LXD.

Container orchestration

One of the challenges in the microservices world is that the number of instances, containers or virtual machines grows adding complexity, if not making it impossible, manual provisioning and deployments. Here is were containers orchestration tools like Kubernetes, Mesos or Marathon come quite handy, helping us to automatically deploy applications, adjust traffic flows and replicate instance among other.

Supporting capabilities

They are not related with the microservices world but they are essential for supporting large systems.

Service gateway

The service gateway help us with the routing, policy enforcement, a proxy for our services or composing multiple service endpoints. There are some options one of them is the Spring Cloud Zuul gateway.

Software defined load balancer

Our load balancers should be smart enough to be able to manage situations where new instances are added or removed, in general, when there are changes in the topology.

There are a few solutions, one of them is the combination of Ribbon, Eureka and Zuul in Spring Cloud Netflix. A second one can be Marathon Load Balancer.

Central log management

When the number of microservices grow in our system the different operations that before were in one server now are taking place in multiple server. All this servers produce logs and to have them in different machines make quite difficult to debug errors sometimes. For this reason, we should have a centrally-managed log repository. In addition, all the generated logs should have a correlation ID to be able to track an execution easily.

Service discovery

With the amount of services increasing the static service resolution is close to imposible. To support all the new addition, we need a service discovery that can deal with this situation in runtime. One option is Spring Cloud Eureka. A different one, more focus in container discovery is Mesos.

Security service

Monolithic applications were able to manage security themselves but, in a microservices ecosystem we need authentication and token services to allow all the communications flow in our ecosystem.

Spring offers a couple of solution like Spring OAuth or Spring Security, but any single sign-on solution should be good.

Service configuration

As we said int he previous article, configurations should be externalized. It is an interesting choice set in our environments and configuration server. Spring, again, provides Spring Cloud Config but there are some other alternatives.

Ops monitoring

There is need to remember that now, with all this new instances, all of them scaling up and down, environment changes, service dependencies and new deployments going on, one of the most important things it is to monitor our system.  Things like Spring Cloud Netflix Turbine or Hystrix dashboard provide service-level information. There are some other tools that provide end-to-end monitoring like AppDynamic or NewRelic.

Dependency management

It is recommended the use of some dependency management visualization tools to be aware of the system complexity. They will help us to check dependencies among services and to take appropriate design decisions.

Data lake

As we have said before, each microservice should have each own data storage and this should not be shared between different microservices. From a design point of view, this is a great solution but, sometimes, organizations need to create reports or they have some business process that use data from different services. To avoid unnecessary dependencies we can set a data lake. They are like data warehouses where to store raw data without any assumption about how the information is going to be use. In this way, any service that needs information about another service, just needs to go to the data lake to find the data.

On of the things we need to consider in this approach is that we ned to propagate the changes to the data lake to maintain the information in synch, some tools that can help us with this is Spring Cloud Data Flow or Kafka.

Reliable messaging

We want to maximize the decoupling among microservices. The way to do this is to develop them as much reactive as possible. For this reliable messaging system are needed. Tools like RabbitMQ, ActiveMQ or Kafka are good for this purpose.

Process and governance capabilities

Basically, how we put everything together and we survive. We need some processes, tool and guidelines around microservices implementations.

DevOps

One of the keys about using a microservice oriented architecture is been agile, quick deploys, builds, continuous integrations, testing… Here is where a DevOps culture come handy in opposite to the waterfall culture.

Automation tools

Continuous integration, continuous delivery, continuous deployments, test automation, all of them are needed or at least recommended in a microservices environment.

And again, testing, testing, testing. I cannot say how important in this, now that we have our system splitted in microservices the need to use mocking techniques to test, and to be completely confident, we need functional and integration tests.

Container registry

We are going to create containers, in the same way we need a repository to store the artifacts we build, we need a container registry to store our containers. There are some options like Docker Hub, Google Container Repository or Amazon EC2 container registry.

Microservice documentation

Microservices system are based on communication. Communication among microservices, calls to APIs offered by this microservices but, we need to ensure that people that want to use our available APIs can understand how to do it. For this reason is important to have a good API repository:

  • Expose repository via a web browser.
  • Provide easy ways to navigate APIs.
  • Well organized.
  • Possibility to invoke and test the endpoint with examples.

For all of this we can use tools like Swagger or RAML.

Reference architecture and libraries

In an ecosystem like this the need to set standard, reference models, best practices and guidelines on how to implement better microservices are even more important than before. All of this should live as a architecture blueprints, libraries, tools and techniques promoted and enforced by the organizations and the developer teams.

I hope that after this article, we start having a rough idea about how to tackle the implementation of our systems following a microservice approach. In addition, a few tools to start playing with.

Note: Article based on my notes about the book “Spring 5.0 Microservices – Second Edition”. Rajesh R. V

Microservices: Capability model

Twelve-Factor Apps

Cloud computing is one of the most rapidly evolving technologies. It promises many benefits, such as cost advantages, speed, agility, flexibility and elasticity.

But, how do we ensure an application can run seamlessly across multiple providers and take advantage of the different cloud services? This means that the application can work effectively in a cloud environment, and understand and utilize cloud behaviors, such as elasticity, utilization-based charging, fail aware, and so on.

It is important to follow certain factors while developing a cloud-native application. For this purpose, we have The Twelve-Factor App. The Twelve-Factor App is a methodology that describes the characteristics expected in a modern cloud-ready application.

The Twelve Factors

I. Codebase

This factor advices that each application should have a single code base with multiple instances of deployment of the same code base. For example, development, testing and production. The code is typically managed in a VCS (Version Control System) like Git, Subversion or other similar.

II. Dependencies

All applications should bundle their dependencies along with the application bundle, and all of them can be managed with build tools like Maven or Gradle. They will be using files to specify and manage these dependencies and linking them using build artifact repositories.

III. Config

All configurations should be externalized from the code. The code should not change among environments, just the properties in the system should change.

IV. Backing services

All backing services should be accessible through an addressable URL. All services should be reachable through a URL without complex communications requirements.

V. Build, release, run

This factor advocates strong isolation among the build stage, the release stage and the run stage. The build stage refers to compiling and producing binaries by including or assets required. The release stage refers to combining binaries with environments-specific configuration parameters. The run stage refers to running applications on a specific execution environment. This pipeline is unidirectional.

VI. Processes

 The factor suggests that processes should be stateless and share nothing. If the application is stateless, then it is fault tolerant and can be scaled out easily.

VII. Port binding

Applications develop following this methodology should be self-contained or standalone and does not rely on runtime injection of a webserver into the execution environment to create a web-facing service. The web app exports HTTP as a service by binding to a port, and listening to requests coming in on that port.

VIII. Concurrency

This factor states that processes should be designed to scale about by replicating the processes. What it means, just spinning up another identical service instance.

IX. Disposability

This factor advocates to build applications with minimal startup and shutdown times. This will help us in automated deployment environments where we need to bring up and down instances as quickly as possible.

X. Dev/Prod parity

This factor establish the importance of keeping the development and the production environments as close as possible. Maybe to save costs, no the local environments where developers write their code, here they tend to run everything in one machine but, at least, we should have a non-production environments close enough to our production environment.

XI. Logs

This factor advocates for the use of a centralized logging framework to avoid I/Os in the systems locally. This is to prevent bottlenecks due to not fast enough I/Os.

XII. Admin processes

 This factor advices you to target the same release and an identical environment as the long running processes runs to perform admin tasks. The admin consoles should be packaged along with the application code.

I recommend you to read carefully the The Twelve-Factor App page and its different sections.

Twelve-Factor Apps