JVM Deep Dive

What is the JVM?

The Java Virtual Machine (JVM) is a specification that provides a runtime environment in which Java code can be executed. It is the component of the technology responsible for its hardware and operating system independence. The JVM has two primary functions:

  • To allow Java programs to run on any device or operating system (“Write once, run anywhere” principle).
  • To manage and optimize program memory.

There are three important distinctions that need to be made when talking about the JMV:

  • JVM specification: The specification document describes an abstract machine, formally describes what is required in a JVM implementation, it does not describe any particular implementation of the Java Virtual Machine. Implementation details are left to the creativity of implementors. The specification document for version 16 can be found here.
  • JVM implementations: They are concrete implementations of the JVM specification. As it has been said before, it is up to the implementors how to develop and materialise the specification. This allows different implementations to focus on the improvement of different areas, prioritise different parts of the specification, or build non-standard implementations. Some reasons why to develop and implementation are:
    • Platform support: Run Java on a platform for which Oracle does not provide a JVM.
    • Resource usage: Tun Java on a device that does not have enough resources to run Oracle’s implementation.
    • Performance: Oracle’s implementation is not fast, scalable or predictable enough.
    • Licensing: Disagreement with Oracle’s licensing policy.
    • Competition: Offering an alternative.
    • Research or fun: Because, why not?
    • Some examples of implementations are Azu Zulu, Eclipse OpenJ9, Graals VM or, Hotspot.
  • JVM instances: It is a running implementation of the JVM.

JVM Architecture

The JVM consists of three distinct components:

  • Class Loader
  • Runtime Memory/Data Area
  • Execution Engine

Class Loader

Class loaders are responsible for loading dynamically Java classes into the data areas during runtime to the JVM. There are three phases in the class loading process: loading, linking, and initialization.

Loading

Loading involves taking the binary representation (bytecode) of a class or interface with a particular name, and generating the original class or interface from that. There are three built-in class loaders available in Java:

  • Bootstrap Class Loader: It loads the standard Java packages like java.lang, java.net, java.util, and so on. These packages are present inside the rt.jar file and other core libraries present in the $JAVA_HOME/jre/lib directory.
  • Extension Class Loader: The extension class loader is a child of the bootstrap class loader and takes care of loading the extensions of the standard core Java classes so that it is available to all applications running on the platform. Extensions are present in the $JAVA_HOME/jre/lib/ext directory.
  • Application Class Loader: It loads the files present on the classpath. By default, the classpath is set to the current directory of the application but, it can be modified.

Loading classes follow a hierarchical pattern, if a parent class loader (Bootstrap -> Extension -> Application) is unable to find a class, it delegates the work to a child class loader. If the last child class loader is not able to load the class either, it throws NoClassDefFoundError or ClassNotFoundException.

Linking

After a class is loaded into memory, it undergoes the linking process. Linking a class or interface involves combining the different elements and dependencies of the program together. Linking includes the following steps:

  • Verification: This phase checks the structural correctness of the .class file by checking it against a set of constraints or rules. If verification fails for some reason, we get a VerifyException. For example, if it has been compiled for a different version of Java.
  • Preparation: In this phase, the JVM allocates memory for the static fields of a class or interface, and initializes them with default values.
  • Resolution: In this phase, symbolic references are replaced with direct references present in the runtime constant pool.

Initialisation

Initialisation involves executing the initialisation method of the class or interface. This can include calling the class’s constructor, executing the static block, and assigning values to all the static variables. This is the final stage of class loading.

Runtime Data Area

Method Area

The Runtime Data Area is divided into five major components:

All the class level data such as the run-time constant pool, field, and method data, and the code for methods and constructors, are stored here. If the memory available in the method area is not sufficient for the program startup, the JVM throws an OutOfMemoryError.

Important to point the method area is created on the virtual machine start-up, and there is only one method area per JVM.

Heap Area

All the objects and their corresponding instance variables are stored here. This is the run-time data area from which memory for all class instances and arrays is allocated.

Again, important to point the heap is created on the virtual machine start-up, and there is only one heap area per JVM.

Stack Area

Whenever a new thread is created in the JVM, a separate runtime stack is also created at the same time. All local variables, method calls, and partial results are stored in the stack area. If the processing been done in a thread requires a larger stack size than what’s available, the JVM throws a StackOverflowError.

For every method call, one entry is made in the stack memory which is called the Stack Frame. When the method call is complete, the Stack Frame is destroyed.

The Stack Frame is divided into three sub-parts:

  • Local Variables: Each frame contains an array of variables known as its local variables. All local variables and their values are stored here. The length of this array is determined at compile-time.
  • Operand Stack: Each frame contains a last-in-first-out (LIFO) stack known as its operand stack. This acts as a runtime workspace to perform any intermediate operations. The maximum depth of this stack is determined at compile-time.
  • Frame Data: All symbols corresponding to the method are stored here. This also stores the catch block information in case of exceptions.

Program Counter (PC) Registers

The JVM supports multiple threads at the same time. Each thread has its own PC Register to hold the address of the currently executing JVM instruction. Once the instruction is executed, the PC register is updated with the next instruction.

Native Method Stacks

The JVM contains stacks that support native methods. These methods are written in a language other than Java, such as C and C++. For every new thread, a separate native method stack is also allocated.

Execution Engine

Once the bytecode has been loaded into the main memory, and details are available in the runtime data area, the next step is to run the program. The Execution Engine handles this by executing the code present in each class.

However, before executing the program, the bytecode needs to be converted into machine language instructions. The JVM can use an interpreter or a JIT compiler for the execution engine.

The Execution Engine has three main components:

Interpreter

The interpreter reads and executes the bytecode instructions line by line, due to this line by line execution, the interpreter is comparatively slower. In addition, if a method is called multiple times, every time a new interpretation is required.

JIT Compiler

The JIT Compiler neutralizes the disadvantage of the interpreter. The Execution Engine will be using the help of the interpreter in converting byte code, but when it finds repeated code it uses the JIT compiler, which compiles the entire bytecode and changes it to native code. This native code will be used directly for repeated method calls, which improve the performance of the system. The JIT Compiler has the following components:

  • Intermediate Code Generator: Generates intermediate code.
  • Code Optimizer: Optimizes the intermediate code for better performance.
  • Target Code Generator: Converts intermediate code to native machine code.
  • Profiler: Finds the hotspots (code that is executed repeatedly)

Garbage Collector

The Garbage Collector (GC) collects and removes unreferenced objects from the heap area. It is the process of reclaiming the runtime unused memory automatically by destroying them. Garbage collection makes Java memory-efficient because it frees space for new objects.

Garbage Collections is done automatically by the JVM at regular intervals and does not need to be handled separately but, It can also be triggered by calling System.gc() with the execution not guaranteed.

It involves two phases:

  • Mark: In this step, the GC identifies the unused objects in memory.
  • Sweep: In this step, the GC removes the objects identified during the previous phase.

The JVM contains 3 different types of garbage collectors:

  • Serial GC: This is the simplest implementation of GC, and is designed for small applications running on single-threaded environments. It uses a single thread for garbage collection. When it runs, it leads to a stop-the-world event where the entire application is paused.
  • Parallel GC: This is the default implementation of GC in the JVM, and is also known as Throughput Collector. It uses multiple threads for garbage collection but still pauses the application when running.
  • Garbage First (G1) GC: G1GC was designed for multi-threaded applications that have a large heap size available (more than 4GB). It partitions the heap into a set of equal size regions and uses multiple threads to scan them. G1GC identifies the regions with the most garbage and performs garbage collection on that region first.
  • Concurrent Mark Sweep (CMS) GC: Deprecated on Java 9 and removed on Java 14.

Java Native Interface (JNI)

JNI acts as a bridge for permitting the supporting packages for other programming languages such as C, C++, and so on. This is especially helpful in cases where you need to write code that is not entirely supported by Java, like some platform-specific features that can only be written in C.

Native Method Libraries

Native Method Libraries are libraries that are written in other programming languages, such as C, C++, and assembly. These libraries are usually present in the form of .dll or .so files. These native libraries can be loaded through JNI.

JVM memory structure

As exposed earlier, JVM manages the memory automatically with the help of the garbage collector. Memory management is the process of the allocation & de-allocation of the objects from a memory.

We have already described the main memory areas on the “Runtime Data Area” but, let’s explore a bit more how they work.

Heap Area

Heap space in Java is used for dynamic memory allocation for Java objects and JRE classes at the runtime. New objects are always created in heap space and the references to these objects are stored in stack memory. These objects have global access and can be accessed from anywhere in the application. This memory model is further broken into smaller parts called generations, these are:

  • Young Generation: This is where all new objects are allocated and aged. A minor Garbage collection occurs when this fills up.
    • Eden space: It is a part of the Young Generation space. When we create an object, the JVM allocates memory from this space.
    • Survivor space: It is also a part of the Young Generation space. Survivor space contains existing objects which have survived the minor GC phases of GC. There a Survivor Space 0 and Survivor Space 1.
  • Old or Tenured Generation: This is where long surviving objects are stored. When objects are stored in the Young Generation, a threshold for the object’s age is set and when that threshold is reached, the object is moved to the old generation.
  • Permanent Generation: This consists of JVM metadata for the runtime classes and application methods.

This area works as follows:

  1. When an object is created, it first allocated to Eden space because this is not that big and gets full quite fast. The garbage collector runs on the Eden space and clears all non-reference object.
  2. When the GC runs, it moves all objects surviving the garbage collecting process into the Survivor space 0. And, if they still survive, object in Survivor Space 0 into Survivor space 1.
  3. If an object survives for X rounds of the garbage collector (X depends on the JVM implementation), it is most likely that it will survive forever, and it gets moved into the Old space.

Metaspace (PermGen)

Metaspace is a new memory space starting from the Java 8 version; it has replaced the older PermGen memory space.

Metaspace is a special heap space separated from the main memory heap. The JVM keeps track of loaded class metadata in the Metaspace. Additionally, the JVM stores all the static content in this memory section. This includes all the static methods, primitive variables, and references to the static objects. It also contains data about bytecode, names, and JIT information.

Class metadata are the runtime representation of java classes within a JVM process, basically any information the JVM needs to work with a Java class. That includes, but is not limited to, runtime representation of data from the JVM class file format.

Metaspace is only released when a GC did run and unload class loaders.

Performance Enhancements

On the Oracle documentation, some performance enhancements can be found. These enhancements are:

  • Compact Strings
  • Tiered Compilation
  • Compressed Ordinary Object Pointer
  • Zero-Based Compressed Ordinary Object Pointers
  • Escape Analysis

For details on the enhancements better check the documentation where there is a solid explanation of these topics.

Tuning the Garbage Collector

Tuning should be the last option we use for increasing the throughput of the application and only when we see a drop in performance because of longer GC causing application timeouts.

Java provides a lot of memory switches that we can use to set the memory sizes and their ratios. Some of the commonly used memory switches are:

-Xms For setting the initial heap size when JVM starts
-Xmx For setting the maximum heap size
-Xmn For setting the size of the young generation (rest is old generation)
-XX:PermGen For setting the initial size of the Permanent Generation Memory
-XX:MaxPermGen For setting the maximum size of Perm Gen
-XX:SurvivorRatio For providing a ratio of Eden space
-XX:NewRatio For providing a ratio of old/new generation sizes. The default value is 2
-XX:+UserSerialGC For enable Serial garbage collector
-XX:+UseParallelGC For enable Parallel garbage collector
-XX:+UseConcmarkSweepGC For enable CMS garbage collector
-XX:+ParallelCMSThreads For enabling CMS Collector as number of threads to use
-XX:+UseG1GC For enable G1 garbage collector
-XX:HeapDumpOnOutOfMemory Pass a parameter to create a heap dump file when this error happens next time.

Tools

After all this explanation and, in addition to all the configurations, a very interesting point is the monitorization of our Java memory. To do this we have multiple tools we can use:

jstat

It is a utility that provides information about the performance and resource consumption of running java applications. We can use the command with the garbage collection option to obtain various information from a java process.

S0C – Current survivor space 0 capacity (KB)
S1C – Current survivor space 1 capacity (KB)
S0U – Survivor space 0 utilization (KB)
S1U – Survivor space 1 utilization (KB)
EC – Current eden space capacity (KB)
EU – Eden space utilization (KB)
OC – Current old space capacity (KB)
OU – Old space utilization (KB)
MC – Metasapce capacity (KB)
MU – Metaspace utilization (KB)
CCSC – Compressed class space capacity (KB)
CCSU – Compressed class space used (KB)
YGC – Number of young generation garbage collector events
YGCT – Young generation garbage collector time
FGC – Number of full GC events
FGCT – Full garbage collector time
GCT – Total garbage collector time

jmap

It is a utility to print the memory-related statistics for a running VM or core file. It is a utility for enhanced diagnostics and reduced performance overhead.

jcmd

It is a utility used to send diagnostic command requests to the JVM, where these requests are useful for control, troubleshoot, and diagnose JVM and Java applications. It must be used on the same machine where the JVM is running, and have the same effective user and group identifiers that were used to launch the JVM.

jhat

It is a utility that provides a convenient browser base easy to use Heap Analysis Tool (HAT). The tool parses a heap dump in binary format (e.g., a heap dump produced by jcmd).

This is slightly different from other tools because when we execute it, it creates a webserver we can access from our browser to read the results.

VisualVM

VisualVM allows us to get detailed information about Java applications while they are running on a JVM and it can be in a local or a remote system also possible to save and capture the data about the JVM software and save data to the local system. VisualVM can do CPU sampling, memory sampling, run garbage collectors, analyze heap errors, take snapshots, and more.

JConsole

It is a graphical monitoring tool to monitor JVM and Java applications both on a local or remote machine. JConsole uses the underlying features of JVM to provide information on the performance and resource consumption of applications running on the Java platform using Java Management Extension (JMX) technology.

Memory Analyzer (MAT)

The Eclipse Memory Analyzer is a fast and feature-rich graphical Java heap analyzer that helps you find memory leaks and reduce memory consumption.

References

JVM Deep Dive

Cache: Spring Boot + Redis

Today, we are going to explore a little bit one of the cache options have available when working with Java projects. This option is Redis.

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker.

— Redis web page —

Let’s do it.

As a base project we are going to use a similar code to the one written for the previous articles: “Cache: Spring Boot + Ehcache” or “Cache: Spring Boot + Caffeine“.

An extra step we need to take here is the creation of a ‘docker-compose.yml‘ file to run Redis. We are going to be using the official image provided by Docker Hub. The content of our compose file will be:

version: '3'

services:
  redis:
    image: redis
    ports:
      - 6379:6379

Once we have Redis running and, our new endpoint ready to go, it is time to start configuring Redis.

First, we are going to create our configuration class. To activate the cache capabilities on Spring we can use the configuration and enable configuration annotations:

  • @Configuration
  • @EnableCaching

And, surprisingly, that’s all the Java configuration we need to write because Spring auto-configuration takes care of the rest. To allow this, we need to add our Redis properties to the ‘application.properties‘ file.

spring.cache.type=redis
spring.redis.host=localhost
spring.redis.port=6379

As simple as that, now, if we have the docker container running, when we start our application it will be able to talk to Redis.

Now on the service, we just need to add the appropriate annotation to indicate we want to use the cache.

@Cacheable(value = "md5-cache")
@Override
public String generateMd5(final String text) {
    log.info("Generating the MD5 hash...");

    try {
        final MessageDigest md = MessageDigest.getInstance("MD5");

        md.update(text.getBytes());

        return DatatypeConverter.printHexBinary(md.digest()).toUpperCase();
    } catch (NoSuchAlgorithmException e) {
        throw new RuntimeException("Unable to get MD5 instance");
    }
}

And, with this, everything should be in place to test it. We just need to run our application and invoke our endpoints, for example, using ‘curl’.

curl http://localhost:8080/api/hashes/hola

The result should be something like this:

2020-11-01 10:30:06.297 : Generating the MD5 hash...

As we can see, invoking multiple times the endpoint only created the first log line and, from this point, any invocation we’ll be taken from the cache.

Obviously, this is a pretty simple example but, this can help us to increase the performance of our system for more complex operations.

As usual, you can find the code here.

Cache: Spring Boot + Redis

Cache: Spring Boot + Caffeine

Today, we are going to explore a little bit one of the cache options have available when working with Java projects. This option is Caffeine.

Caffeine is a high performance, near optimal caching library based on Java 8. For more details, see our user’s guide and browse the API docs for the latest release.

— Caffeine wiki —

Let’s do it.

As a base project we are going to use a similar code to the one written for the previous article “Cache: Spring Boot + Ehcache“.

The only thing we are going to change is we are going to duplicate the existing endpoint to allow be able to try two different ways of working with Caffeine.

Once we have our new endpoint ready to go, it is time to start configuring Caffeine. We are going to take two different approaches:

  1. Make use of Spring injection capabilities.
  2. More manual approach.

Leveraging Spring injection capabilities

First, we are going to create our configuration class. To activate the cache capabilities on Spring we can use the configuration and enable configuration annotations:

  • @Configuration
  • @EnableCaching

With this, we can add now the beans to create our cache and configure appropriately Caffeine.

@Bean
@SuppressWarnings("all")
public Caffeine caffeineConfig() {
    return Caffeine.newBuilder()
        .maximumSize(50)
        .expireAfterWrite(10, TimeUnit.SECONDS)
        .removalListener(CacheEventLogger.removalListener());
}

@Bean
@SuppressWarnings("all")
public CacheManager cacheManager(final Caffeine caffeine) {
    final CaffeineCacheManager caffeineCacheManager = new CaffeineCacheManager();

    caffeineCacheManager.setCaffeine(caffeine);

    return caffeineCacheManager;
}

As you can see, something pretty simple. I have tried to mimic the configuration set for the Ehcache example on the previous article. If you have not done it, you can check it now. A summary of this configuration is:

  • Cache size: 50 entries.
  • And the expiration policy: Expiration after the write of 10 seconds.

Now on the service, we just need to add the appropriate annotation to indicate we want to use the cache.

@Cacheable(cacheNames = MD5_CACHE_ID)
@Override
public String generateMd5SpringCache(final String text) {
    log.info("The value was not cached by Spring");
    return generateMd5(text);
}

That simple.

Manual approach

We have the possibility of creating the cache manually. This can be desired for multiple reasons like: not having Spring available, wanting component isolation, not dealing with the cache manages an multiple caches or, whatever reason, us, as a developers decide that it fits the best our use case.

To do this manual configuration we just need to exclude the configuration class and the beans’ creation, create the cache in our service class and invoke it when a request arrives.

final LoadingCache<String, String> md5Cache = Caffeine.newBuilder()
    .maximumSize(50)
    .expireAfterWrite(10, TimeUnit.SECONDS)
    .removalListener(CacheEventLogger.removalListener())
    .build(this::generateMd5Wrapper);

@Override
public String generateMd5ManualCache(final String text) {
    return md5Cache.get(text);
}

Nothing too fancy. It is worth it a note about the method ‘generateMd5Wrapper‘. It is completely unnecessary, the only reason it has been created is to be able to write an extra log line to run the demo and to have visible effects of the cache working.

The last thing we have defined is a removal listener to log when an object is removed from the cache. Again, this is just for demo purposes and, it is not necessary.

public static RemovalListener<String, String> removalListener() {
    return (String key, String graph, RemovalCause cause) ->
        log.info("Key {} was removed ({})", key, cause);
}

And, with this, everything should be in place to test it. We just need to run our application and invoke our endpoints, for example, using ‘curl’.

curl http://localhost:8080/api/hashes/spring/hola
curl http://localhost:8080/api/hashes/manual/hola

The result should be something like this:

2020-10-31 08:15:19.610 : The value was not cached by Spring
2020-10-31 08:15:35.316 : The value was not cached by Spring
2020-10-31 08:15:35.317 : Key hola was removed (EXPIRED)
2020-10-31 08:15:39.717 : The value was not cached manually
2020-10-31 08:15:55.443 : The value was not cached manually
2020-10-31 08:15:55.443 : Key hola was removed (EXPIRED)

As we can see, invoking multiple times the endpoint only created the first log line and, it is just after waiting for some time (more than 10 seconds) when the cache entry gets expired and re-created.

Obviously, this is a pretty simple example but, this can help us to increase the performance of our system for more complex operations.

As usual, you can find the code here.

Cache: Spring Boot + Caffeine

Cache: Spring Boot + Ehcache

Today, we are going to explore a little bit one of the cache options have available when working with Java projects. This option is Ehcache.

Ehcache is an open-source, standards-based cache that boosts performance, offloads your database, and simplifies scalability. It’s the most widely-used Java-based cache because it’s robust, proven, full-featured, and integrates with other popular libraries and frameworks. Ehcache scales from in-process caching, all the way to mixed in-process/out-of-process deployments with terabyte-sized caches.

— Ehcache web page —

In our case, we are going to use Ehcache version 3 as this provides an implementation of a JSR-107 cache manager and Spring Boot to create a simple endpoint that is going to return the MD5 hash of a given text.

Let’s do it.

We are going to be starting a maven project and adding various dependencies:

DependencyVersionComment
spring-boot-starter-parent2.3.4.RELEASEParent of our project
spring-boot-starter-webManaged by the parent
spring-boot-starter-actuatorManaged by the parent
spring-boot-starter-cacheManaged by the parent
lombokManaged by the parent
javax.cache:cache-api1.1.1
org.ehcache:ehcache3.8.1
Project dependencies

Now, let’s create the endpoint and the service we are going to cache. Assuming you, the reader, have some knowledge of spring, I am not going to go into details on this.

// Controller code
@RestController
@RequestMapping(value = "/api/hashes")
@AllArgsConstructor
public class HashController {

    private final HashService hashService;

    @GetMapping(value = "/{text}", produces = APPLICATION_JSON_VALUE)
    public HttpEntity<String> generate(@PathVariable final String text) {
        return ResponseEntity.ok(hashService.generateMd5(text));
    }
}

// Service code
@Service
public class HashServiceImpl implements HashService {

    @Override
    public String generateMd5(final String text) {
        try {
            final MessageDigest md = MessageDigest.getInstance("MD5");

            md.update(text.getBytes());

            return DatatypeConverter.printHexBinary(md.digest()).toUpperCase();
        } catch (NoSuchAlgorithmException e) {
            throw new RuntimeException("Unable to get MD5 instance");
        }
    }
}

Simple stuff. Let’s know add the cache capabilities.

First, we will add the cache configuration to our service as a new annotation.

@Cacheable(value = "md5-cache")

This is going to define the name that it is going to be used for this cache ‘md5-cache’. As a key, the content of the method parameter will be used.

The next step is to add the configuration. To activate the cache capabilities on Spring we can use the configuration and enable configuration annotations:

  • @Configuration
  • @EnableCaching

Even with this, and using the Spring Boot auto-configuration, no caches are created by default and we need to create them. There are two ways this can be done:

  • Using and XML file with the configuration.
  • Programmatically.

If you are a follower of this blog or you have read some of the existing posts, probably, you have realised I am not a big fan of the XML configuration and I prefer to do things programmatically and, this is what we are going to do. In any case, I will try to add the XML equivalent to the configuration but, it has not been tested.

The full configuration is:

@Bean
CacheManager getCacheManager() {
    final CachingProvider provider = Caching.getCachingProvider();
    final CacheManager cacheManager = provider.getCacheManager();

    final CacheConfigurationBuilder<String, String> configurationBuilder =
        CacheConfigurationBuilder.newCacheConfigurationBuilder(
            String.class, String.class,
            ResourcePoolsBuilder.heap(50)
                .offheap(10, MemoryUnit.MB)) .withExpiry(ExpiryPolicyBuilder.timeToIdleExpiration(Duration.ofSeconds(10)));

    final CacheEventListenerConfigurationBuilder asynchronousListener = CacheEventListenerConfigurationBuilder
        .newEventListenerConfiguration(new CacheEventLogger(), EventType.CREATED, EventType.EXPIRED)
        .unordered().asynchronous();

    cacheManager.createCache("md5-cache",
        Eh107Configuration.fromEhcacheCacheConfiguration(configurationBuilder.withService(asynchronousListener)));

    return cacheManager;
}

But, let’s explain it in more details.

final CacheConfigurationBuilder<String, String> configurationBuilder =
        CacheConfigurationBuilder.newCacheConfigurationBuilder(
                String.class, String.class,
                ResourcePoolsBuilder.heap(50)
                    .offheap(10, MemoryUnit.MB))
.withExpiry(ExpiryPolicyBuilder.timeToIdleExpiration(Duration.ofSeconds(10)));

Here we can see of the cache characteristics:

  • The type of data: String for both, key and value.
  • Cache size: Heap = 50 entries and size 10MB (obviously absurd numbers but good enough to exemplify).
  • And the expiration policy: ‘Time to Idle’ of 10 seconds. It can be defined as ‘Time to Live’.

The next thing we are creating is a cache listener to log the operations:

final CacheEventListenerConfigurationBuilder asynchronousListener = CacheEventListenerConfigurationBuilder
            .newEventListenerConfiguration(new CacheEventLogger(), EventType.CREATED, EventType.EXPIRED)
            .unordered().asynchronous();

Basically, we are going log a message when the cache creates or expired and entry. Other events can be added.

And, finally, we create the cache:

cacheManager.createCache("md5-cache",
Eh107Configuration.fromEhcacheCacheConfiguration(configurationBuilder.withService(asynchronousListener)));

With the name matching the one we have used on the service annotation.

The XML configuration should be something like:

<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://www.ehcache.org/v3"
    xmlns:jsr107="http://www.ehcache.org/v3/jsr107"
    xsi:schemaLocation="
        http://www.ehcache.org/v3 http://www.ehcache.org/schema/ehcache-core-3.0.xsd
        http://www.ehcache.org/v3/jsr107 http://www.ehcache.org/schema/ehcache-107-ext-3.0.xsd">

<cache alias="md5-cache">
    <key-type>java.lang.String</key-type>
    <value-type>java.lang.String</value-type>
    <expiry>
        <tti unit="seconds">10</ttl>
    </expiry>

    <listeners>
        <listener>
<class>dev.binarycoders.ehcache.utils.CacheEventLogger</class>
            <event-firing-mode>ASYNCHRONOUS</event-firing-mode>
            <event-ordering-mode>UNORDERED</event-ordering-mode>
            <events-to-fire-on>CREATED</events-to-fire-on>
            <events-to-fire-on>EXPIRED</events-to-fire-on>
        </listener>
    </listeners>

    <resources>
        <heap unit="entries">50</heap>
        <offheap unit="MB">10</offheap>
    </resources>
</cache>

Just remember we need to add a property to our ‘application.properties’ file if we choose the XML approach.

spring.cache.jcache.config=classpath:ehcache.xml

And, with this, everything should be in place to test it. We just need to run our application and invoke our endpoint, for example, using ‘curl’.

curl http://localhost:8080/api/hashes/hola

The result should be something like this:

2020-10-25 11:29:22.364 : Type: CREATED, Key: hola, Old: null, New: D41D8CD98F00B204E9800998ECF8427E
2020-10-25 11:29:42.707 : Type: EXPIRED, Key: hola, Old: D41D8CD98F00B204E9800998ECF8427E, New: null
2020-10-25 11:29:42.707 : Type: CREATED, Key: hola, Old: null, New: D41D8CD98F00B204E9800998ECF8427E

As we can see, invoking multiple times the endpoint only created the first log line and, it is just after waiting for some time (more than 10 seconds) when the cache entry gets expired and re-created.

Obviously, this is a pretty simple example but, this can help us to increase the performance of our system for more complex operations.

As usual, you can find the code here.

Cache: Spring Boot + Ehcache

Spring Application Events

Today, we are going to implement a simple example using spring application events.

Spring application events allow us to throw and listen to specific application events that we can process as we wish. Events are meant for exchanging information between loosely coupled components. As there is no direct coupling between publishers and subscribers, it enables us to modify subscribers without affecting the publishers and vice-versa.

To build our PoC and to execute it, we are going to need just a few classes. We will start with a basic Spring Boot project with the ‘web’ starter. And, once we have that in place (you can use the Spring Initializr) we can start adding our classes.

Let’s start with a very basic ‘User’ model

public class User {

    private String firstname;
    private String lastname;

    public String getFirstname() {
        return firstname;
    }

    public User setFirstname(String firstname) {
        this.firstname = firstname;
        return this;
    }

    public String getLastname() {
        return lastname;
    }

    public User setLastname(String lastname) {
        this.lastname = lastname;
        return this;
    }

    @Override
    public String toString() {
        return "User{" +
                "firstname='" + firstname + '\'' +
                ", lastname='" + lastname + '\'' +
                '}';
    }
}

Nothing out of the ordinary here. Just a couple of properties and some getter and setter methods.

Now, let’s build a basic service that is going to simulate a ‘register’ operation:

...
import org.springframework.context.ApplicationEventPublisher;
...

@Service
public class UserService {

    private static final Logger logger = LoggerFactory.getLogger(UserService.class);

    private final ApplicationEventPublisher publisher;

    public UserService(ApplicationEventPublisher publisher) {
        this.publisher = publisher;
    }

    public void register(final User user) {
        logger.info("Registering {}", user);

        publisher.publishEvent(new UserRegistered(user));
    }
}

Here we have the first references to the event classes the Spring Framework offers us. The ‘ApplicationEventPublished’ that it will allow us to publish the desired event to be consumer by listeners.

The second reference we are going to have to the events framework is when we create and event class we are going to send. In this case, the class ‘UserRegistered’ we can see on the publishing line above.

...
import org.springframework.context.ApplicationEvent;

public class UserRegistered extends ApplicationEvent {

    public UserRegistered(User user) {
        super(user);
    }
}

As we can see, extending the class ‘ApplicationEvent’ we have very easily something we can publish and listen to it.

Now. let’s implements some listeners. The first of them is going to be one implementing the class ‘ApplicationListener’ and, the second one, it is going to be annotation based. Two simple options offered by Spring to build our listeners.

...
import org.springframework.context.ApplicationListener;
import org.springframework.context.event.EventListener;
import org.springframework.stereotype.Component;

public class UserListeners {

    // Technical note: By default listener events return 'void'. If an object is returned, it will be published as an event

    /**
     * Example of event listener using the implementation of {@link ApplicationListener}
     */
    static class RegisteredListener implements ApplicationListener<UserRegistered> {

        private static final Logger logger = LoggerFactory.getLogger(RegisteredListener.class);

        @Override
        public void onApplicationEvent(UserRegistered event) {
            logger.info("Registration event received for {}", event);
        }
    }

    /**
     * Example of annotation based event listener
     */
    @Component
    static class RegisteredAnnotatedListener {

        private static final Logger logger = LoggerFactory.getLogger(RegisteredAnnotatedListener.class);

        @EventListener
        void on(final UserRegistered event) {
            logger.info("Annotated registration event received for {}", event);
        }
    }
}

As we can see, very basic stuff. It is worth it to mention the ‘Technical note’. By default, the listener methods return ‘void’, they are initially design to received an event, do some stuff and finish. But, obviously, they can at the same time publish some messages, we can achieve this easily, returning an object. The returned object will be published as any other event.

Once we have all of this, let’s build a simple controller to run the process:

@RestController
@RequestMapping("/api/users")
public class UserController {

    private final UserService userService;

    public UserController(UserService userService) {
        this.userService = userService;
    }

    @GetMapping
    @ResponseStatus(HttpStatus.CREATED)
    public void register(@RequestParam("firstname") final String firstname,
                         @RequestParam("lastname") final String lastname) {
        Objects.requireNonNull(firstname);
        Objects.requireNonNull(lastname);

        userService.register(new User().setFirstname(firstname).setLastname(lastname));
    }
}

Nothing out of the ordinary, simple stuff.

We can invoke the controller with any tools we want but, a simple way, it is using cURL.

curl -X GET "http://localhost:8080/api/users?firstname=john&lastname=doe"

Once we call the endpoint, we can see the log messages generated by the publisher and the listeners:

Registering User{firstname='john', lastname='doe'}
Annotated registration event received for dev.binarycoders.spring.event.UserRegistered[source=User{firstname='john', lastname='doe'}]
Registration event received for dev.binarycoders.spring.event.UserRegistered[source=User{firstname='john', lastname='doe'}]

As we can see, the ‘register’ action is executed and it publishes the event and, both listeners, the annotated and the implemented, receive and process the message.

As usual you can find the source for this example here, in the ‘spring-events’ module.

For some extra information, you can take a look at one of the videos of the last SpringOne.

Spring Application Events

Spring Boot with Kafka

Today we are going to build a very simple demo code using Spring Boot and Kafka.

The application is going to contain a simple producer and consumer. In addition, we will add a simple endpoint to test our development and configuration.

Let’s start.

The project is going to be using:

  • Java 14
  • Spring Boot 2.3.4

A good place to start generating our project is Spring Initialzr. There we can create easily the skeleton of our project adding some basic information about our project. We will be adding two dependencies:

  • Spring Web.
  • Spring for Apache Kafka.
  • Spring Configuration Processor (Optional).

Once we are done filling the form we only need to generate the code and open it on our favourite code editor.

As an optional dependency, I have added the “Spring Boot Configuration Processor” dependency to be able to define some extra properties that we will be using on the “application.properties” file. As I have said is optional, we are going to be able to define and use the properties without it but, it going to solve the warning of them not been defined. Up to you.

Whit the three dependencies, our “pom.xml” should look something like:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.kafka</groupId>
  <artifactId>spring-kafka</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-configuration-processor</artifactId>
  <optional>true</optional>
</dependency>

The next step is going to be creating our kafka producer and consumer to be able to send a message using the distributed event streaming platform.

For the producer code we are just going to create a basic method to send a message making use of the “KafkaTemplate” offered by Spring.

@Service
public class KafkaProducer {
    public static final String TOPIC_NAME = "example_topic";

    private final KafkaTemplate<String, String> kafkaTemplate;

    public KafkaProducer(KafkaTemplate<String, String> kafkaTemplate) {
        this.kafkaTemplate = kafkaTemplate;
    }

    public void send(final String message) {
        kafkaTemplate.send(TOPIC_NAME, message);
    }
}

The consumer code is going to be even more simple thanks to the “KafkaListener” provided by Spring.

@Service
public class KafkaConsumer {

    @KafkaListener(topics = {KafkaProducer.TOPIC_NAME}, groupId = "example_group_id")
    public void read(final String message) {
        System.out.println(message);
    }
}

And finally, to be able to test it, we are going to define a Controller to invoke the Kafka producer.

@RestController
@RequestMapping("/kafka")
public class KafkaController {

    private final KafkaProducer kafkaProducer;

    public KafkaController(KafkaProducer kafkaProducer) {
        this.kafkaProducer = kafkaProducer;
    }

    @PostMapping("/publish")
    public void publish(@RequestBody String message) {
        Objects.requireNonNull(message);

        kafkaProducer.send(message);
    }
}

With this, all the necessary code is done. Let’s now go for the configuration properties and the necessary Docker images to run all of this.

First, the “application.properties” file. It is going to contain some basic configuration properties for the producer and consumer.

server.port=8081

spring-boot-kafka.config.kafka.server=localhost
spring-boot-kafka.config.kafka.port=9092

# Kafka consumer properties
spring.kafka.consumer.bootstrap-servers=${spring-boot-kafka.config.kafka.server}:${spring-boot-kafka.config.kafka.port}
spring.kafka.consumer.group-id=example_group_id
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer

# kafka producer properties
spring.kafka.producer.bootstrap-servers=${spring-boot-kafka.config.kafka.server}:${spring-boot-kafka.config.kafka.port}
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer

In the line 8, we can see the property “spring.kafka.consumer.group-id”. If we look carefully at it, we will see that it match the previous definition of the “groupId” we have done on the consumer.

Lines 10, 11 and 15, 16 define the serialization and de-serialization classes.

Finally, in the list 3 and 4 we have defined a couple of properties to avoid repetition. This are the properties that are showing us a warning message.

To fix it, if we have previously added the “Spring Configuration Processor” dependency, now, we can add the file:

spring-boot-kafka/src/main/resources/META-INF/additional-spring-configuration-metadata.json

With the definition of this properties:

{
  "properties": [
    {
      "name": "spring-boot-kafka.config.kafka.server",
      "type": "java.lang.String",
      "description": "Location of the Kafka server."
    },
    {
      "name": "spring-boot-kafka.config.kafka.port",
      "type": "java.lang.String",
      "description": "Port of the Kafka server."
    }
  ]
}

We are almost there. The only thing remaining is the Apache Kafka. Because we do not want to deal with the complexity of setting an Apache Kafka server, we are going to leverage the power of Docker and create a “docker-compose” file to run it for us:

version: '3'

services:
  zookeeper:
    image: wurstmeister/zookeeper
    ports:
      - 2181:2181
    container_name: zookepper

  kafka:
    image: wurstmeister/kafka
    ports:
      - 9092:9092
    environment:
      KAFKA_ADVERTISED_HOST_NAME: localhost
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_CREATE_TOPIC: "example_topic:1:3"

As we can see, simple stuff, nothing out of the ordinary. Two images, one for Zookepper and one for Apache Kafka, the definition of some ports (remember to match them with the ones in the application.propeties file) and a few variables need for the Apache Kafka image.

With this, we can run the docker-compose file and obtain two containers running:

Now, we can test the end point we have built previously. In this case, to make it simple, we are going to use curl:

`curl -d '{"message":"Hello from Kafka!}' -H "Content-Type: application/json" -X POST http://localhost:8081/kafka/publish`

The result should be something like:

And, this is all. You can find the full source code here.

Enjoy it!

Spring Boot with Kafka

Docker Java Client

No questions about containers been one of the latest big things. They are everywhere, everyone use them or want to use them and the truth is they are very useful and they have change the way we develop.

We have docker for example that provides us with docker and docker-compose command line tools to be able to run one or multiple containers with just a few lines or a configuration file. But, we, developers, tend to be lazy and we like to automize things. For example, what about when we are running our application in our favorite IDE the application starts the necessary containers for us? It will be nice.

The point of this article was to play a little bit the a docker java client library, the case exposed above is just an example, probably you readers can think about better cases but, for now, this is enough for my learning purposes.

Searching I have found two different docker java client libraries:

I have not analyze them or compare them, I have just found the one by GitHub before and it looks mature and usable enough. For this reason, it is the one I am going to use for the article.

Let’s start.

First, we need to add the dependency to our pom.xml file:

<dependency>
    <groupId>com.github.docker-java</groupId>
    <artifactId>docker-java</artifactId>
    <version>3.1.1</version>
</dependency>

The main class we are going to use to execute the different instructions is the DockerClient class. This is the class that establishes the communication between our application and the docker engine/daemon in our machine. The library offers us a very intuitive builder to generate the object:

DockerClient dockerClient = DockerClientBuilder.getInstance().build();

There are some options that we can configure but for a simple example is not necessary. I am just going to say there is a class called DefaultDockerClientConfig where a lot of different properties can be set. After that, we just need to call the getInstance method with our configuration object.

Image management

Listing images

List<Image> images = dockerClient.listImagesCmd().exec();

Pulling images

dockerClient.pullImageCmd("postgres")
                .withTag("11.2")
                .exec(new PullImageResultCallback())
                .awaitCompletion(30, TimeUnit.SECONDS);

Container management

Listing containers

// List running containers
dockerClient.listContainersCmd().exec();

// List existing containers
dockerClient.listContainersCmd().withShowAll(true).exec();

Creating containers

CreateContainerResponse container = dockerClient.createContainerCmd("postgres:11.2")
                .withName("postgres-11.2-db")
                .withExposedPorts(new ExposedPort(5432, InternetProtocol.TCP))
                .exec();

Starting containers

dockerClient.startContainerCmd(container.getId()).exec();

Other

There are multiple operations we can do with containers, the above is just a short list of examples but, we can extended with:

  • Image management
    • Listing images: listImagesCmd()
    • Building images: buildImageCmd()
    • Inspecting images: inspectImageCmd("id")
    • Tagging images: tagImageCmd("id", "repository", "tag")
    • Pushing images: pushImageCmd("repository")
    • Pulling images: pullImageCmd("repository")
    • Removing images: removeImageCmd("id")
    • Search in registry: searchImagesCmd("text")
  • Container management
    • Listing containers: listContainersCmd()
    • Creating containers: createContainerCmd("repository:tag")
    • Starting containers: startContainerCmd("id")
    • Stopping containers: stopContainerCmd("id")
    • Killing containers: killContainerCmd("id")
    • Inspecting containers: inspectContainerCmd("id")
    • Creating a snapshot: commitCmd("id")
  • Volume management
    • Listing volumes: listVolumesCmd()
    • Inspecting volumes: inspectVolumeCmd("id")
    • Creating volumes: createVolumeCmd()
    • Removing volumes: removeVolumeCmd("id")
  • Network management
    • Listing networks: listNetworksCmd()
    • Creating networks: createNetworkCmd()
    • Inspecting networks: inspectNetworkCmd().withNetworkId("id")
    • Removing networks: removeNetworkCmd("id")

And, that is all. There is a project suing some of the operations listed here. It is call country, it is one of my learning projects, and you can find it here.

Concretely, you can find the code using the docker java client library here, and the code using this library here, specifically in the class PopulationDevelopmentConfig.

I hope you find it useful.

Docker Java Client

Java reflection: Accessing a private field

Sometimes, when we import 3rd-party libraries, we can find cases where the information we want is has been populated in an object but, the property is a private one and there are no methods to recover it. Here, it is where the Java reflection capabilities can help us.

Note: Use reflection carefully and, usually, as a last resource.

Let’s say we have a simple class with a private property:

class A {
    private String value = "value";
}

We want to access the property “value” but, for obvious reasons, we cannot do it.

We can implement our reflection method to access the property value:

import java.lang.reflect.Field;

public class Reflection {

   public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException {
      final A a = new A();
      final Field secureRenegotiationField = a.getClass().getDeclaredField("value");

      secureRenegotiationField.setAccessible(true);

      System.out.println((String) secureRenegotiationField.get(a));
   }
}

With these few lines of code, we will be able to access the value.

Java reflection: Accessing a private field

Groups in Regular Expressions

We cannot discuss the power of regular expression, an amazing tool with unlimited (usually our imagination) capabilities to progress strings. Every developer should, at least, have a basic understanding of them. But, lately, I have realized not a lot of people knows the possibility of creating and labeling “groups”. Groups allow us to access in a very  simple and clear way to the expressions matching our regular expression.

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses “(” and “)” metacharacters. Any subpattern inside a pair of parentheses will be captured as a group. In practice, this can be used to extract information like phone numbers or emails from all sorts of data.

Here, I am just going to write a little example to show the basic behavior and, I leave to all of you to find the appropriate use cases. In the example, I am going to extract some different hashes for further processing.

public static void main(String[] args) {
    final Pattern HASH_PATTERN = Pattern.compile("^(?<md5>[0-9a-f]{32})(?:/)(?<sha1>[0-9a-f]{40})(?:/)(?<sha256>[0-9a-f]{64})$");
    final Matcher matcher = HASH_PATTERN.matcher("ce114e4501d2f4e2dcea3e17b546f339/a54d88e06612d820bc3be72877c74f257b561b19/c7be1ed902fb8dd4d48997c6452f5d7e509fbcdbe2808b16bcf4edce4c07d14e");

    if (matcher.matches()) {         
        final String md5 = matcher.group("md5");         
        final String sha1 = matcher.group("sha1");         
        final String sha256 = matcher.group("sha256");         
        ...
    }     
    ... 
}

As you can see, the example is pretty simple, it takes one line that contains a string and extracts the MD5, SHA1 and SHA256 hashes. We can see the code is easy to read and understand because everything is using human readable labels not just numbers to access the groups and there is no need to process the string with split operations or similars.

The syntax for the groups is:

(?<name>X)– X, as a named-capturing group

(?:X) – X, as a non-capturing group

With this, we can easily make our code easier to read and maintain when we are extracting information or doing some text processing.

For further information, check the Java documentation: Pattern (Java SE 10 & JDK 10)

Groups in Regular Expressions

Error Prone

We, as a developers, sometimes, make mistakes or add bugs to our code without realizing. For this reason static analyzers are a handy tool to apply during our builds or during our code verification processes.

One of these tools is Error Prone.

Error Prone is Google’s Java bug detection and static analysis tool. It is integrated into the Java compiler and catches bugs at compile time. It supports plugin checks for project-specific enforcement.

Basically, it is a tool created by Google for code analysis and error detection for the Java language. It is integrated inside the compiler and tries to detect bugs in compilation time.

But, let’s see and example. Imagine we have a program with the next line of code:

String.format("Param A: {}, param B: {}, param C: {}", paramA, paramB, paramC);

Obviously, it is not correct and the error comes from, maybe, a transformation between a previous log message to a different kind of message. The compiler is not going to complain because it is a string message and it is not a syntax error. But, the truth is there is an error.

When we try to compile the program with Error Prone, we are going to receive a compilation error message like this:

error: [FormatString] extra format arguments: used 0, provided 3
String.format("Param A: {}, param B: {}, param C: {}", paramA, paramB, paramC);              ^
(see http://errorprone.info/bugpattern/FormatString)

We can see clearly and without any doubts there is an error. Even, a link to the error description is provided.

The proper code should be:

String.format("Param A: %s, param B: %s, param C: %s", paramA, paramB, paramC);

The easiest way to start using the tool, it is to add the maven plugin to our pom.xml file:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <version>3.3</version>
    <configuration>
        <compilerId>javac-with-errorprone</compilerId>
        <forceJavacCompilerUse>true</forceJavacCompilerUse>
        <source>8</source>
        <target>8</target>
    </configuration>
    <dependencies>
        <dependency>
            <groupId>org.codehaus.plexus</groupId>
            <artifactId>plexus-compiler-javac-errorprone</artifactId>
            <version>2.8.1</version>
        </dependency>
        <dependency>
            <groupId>com.google.errorprone</groupId>
            <artifactId>error_prone_core</artifactId>
            <version>2.0.19</version>
        </dependency>
    </dependencies>
</plugin>

For more options, just go to the installation instructions page.

The project is open source and you can see all the code in the official repository: error-prone.

I am not saying that it is going to solve all your problems but, at least, it is another tool to increase our code quality and avoid silly mistakes.

Error Prone