JVM Deep Dive

What is the JVM?

The Java Virtual Machine (JVM) is a specification that provides a runtime environment in which Java code can be executed. It is the component of the technology responsible for its hardware and operating system independence. The JVM has two primary functions:

  • To allow Java programs to run on any device or operating system (“Write once, run anywhere” principle).
  • To manage and optimize program memory.

There are three important distinctions that need to be made when talking about the JMV:

  • JVM specification: The specification document describes an abstract machine, formally describes what is required in a JVM implementation, it does not describe any particular implementation of the Java Virtual Machine. Implementation details are left to the creativity of implementors. The specification document for version 16 can be found here.
  • JVM implementations: They are concrete implementations of the JVM specification. As it has been said before, it is up to the implementors how to develop and materialise the specification. This allows different implementations to focus on the improvement of different areas, prioritise different parts of the specification, or build non-standard implementations. Some reasons why to develop and implementation are:
    • Platform support: Run Java on a platform for which Oracle does not provide a JVM.
    • Resource usage: Tun Java on a device that does not have enough resources to run Oracle’s implementation.
    • Performance: Oracle’s implementation is not fast, scalable or predictable enough.
    • Licensing: Disagreement with Oracle’s licensing policy.
    • Competition: Offering an alternative.
    • Research or fun: Because, why not?
    • Some examples of implementations are Azu Zulu, Eclipse OpenJ9, Graals VM or, Hotspot.
  • JVM instances: It is a running implementation of the JVM.

JVM Architecture

The JVM consists of three distinct components:

  • Class Loader
  • Runtime Memory/Data Area
  • Execution Engine

Class Loader

Class loaders are responsible for loading dynamically Java classes into the data areas during runtime to the JVM. There are three phases in the class loading process: loading, linking, and initialization.

Loading

Loading involves taking the binary representation (bytecode) of a class or interface with a particular name, and generating the original class or interface from that. There are three built-in class loaders available in Java:

  • Bootstrap Class Loader: It loads the standard Java packages like java.lang, java.net, java.util, and so on. These packages are present inside the rt.jar file and other core libraries present in the $JAVA_HOME/jre/lib directory.
  • Extension Class Loader: The extension class loader is a child of the bootstrap class loader and takes care of loading the extensions of the standard core Java classes so that it is available to all applications running on the platform. Extensions are present in the $JAVA_HOME/jre/lib/ext directory.
  • Application Class Loader: It loads the files present on the classpath. By default, the classpath is set to the current directory of the application but, it can be modified.

Loading classes follow a hierarchical pattern, if a parent class loader (Bootstrap -> Extension -> Application) is unable to find a class, it delegates the work to a child class loader. If the last child class loader is not able to load the class either, it throws NoClassDefFoundError or ClassNotFoundException.

Linking

After a class is loaded into memory, it undergoes the linking process. Linking a class or interface involves combining the different elements and dependencies of the program together. Linking includes the following steps:

  • Verification: This phase checks the structural correctness of the .class file by checking it against a set of constraints or rules. If verification fails for some reason, we get a VerifyException. For example, if it has been compiled for a different version of Java.
  • Preparation: In this phase, the JVM allocates memory for the static fields of a class or interface, and initializes them with default values.
  • Resolution: In this phase, symbolic references are replaced with direct references present in the runtime constant pool.

Initialisation

Initialisation involves executing the initialisation method of the class or interface. This can include calling the class’s constructor, executing the static block, and assigning values to all the static variables. This is the final stage of class loading.

Runtime Data Area

Method Area

The Runtime Data Area is divided into five major components:

All the class level data such as the run-time constant pool, field, and method data, and the code for methods and constructors, are stored here. If the memory available in the method area is not sufficient for the program startup, the JVM throws an OutOfMemoryError.

Important to point the method area is created on the virtual machine start-up, and there is only one method area per JVM.

Heap Area

All the objects and their corresponding instance variables are stored here. This is the run-time data area from which memory for all class instances and arrays is allocated.

Again, important to point the heap is created on the virtual machine start-up, and there is only one heap area per JVM.

Stack Area

Whenever a new thread is created in the JVM, a separate runtime stack is also created at the same time. All local variables, method calls, and partial results are stored in the stack area. If the processing been done in a thread requires a larger stack size than what’s available, the JVM throws a StackOverflowError.

For every method call, one entry is made in the stack memory which is called the Stack Frame. When the method call is complete, the Stack Frame is destroyed.

The Stack Frame is divided into three sub-parts:

  • Local Variables: Each frame contains an array of variables known as its local variables. All local variables and their values are stored here. The length of this array is determined at compile-time.
  • Operand Stack: Each frame contains a last-in-first-out (LIFO) stack known as its operand stack. This acts as a runtime workspace to perform any intermediate operations. The maximum depth of this stack is determined at compile-time.
  • Frame Data: All symbols corresponding to the method are stored here. This also stores the catch block information in case of exceptions.

Program Counter (PC) Registers

The JVM supports multiple threads at the same time. Each thread has its own PC Register to hold the address of the currently executing JVM instruction. Once the instruction is executed, the PC register is updated with the next instruction.

Native Method Stacks

The JVM contains stacks that support native methods. These methods are written in a language other than Java, such as C and C++. For every new thread, a separate native method stack is also allocated.

Execution Engine

Once the bytecode has been loaded into the main memory, and details are available in the runtime data area, the next step is to run the program. The Execution Engine handles this by executing the code present in each class.

However, before executing the program, the bytecode needs to be converted into machine language instructions. The JVM can use an interpreter or a JIT compiler for the execution engine.

The Execution Engine has three main components:

Interpreter

The interpreter reads and executes the bytecode instructions line by line, due to this line by line execution, the interpreter is comparatively slower. In addition, if a method is called multiple times, every time a new interpretation is required.

JIT Compiler

The JIT Compiler neutralizes the disadvantage of the interpreter. The Execution Engine will be using the help of the interpreter in converting byte code, but when it finds repeated code it uses the JIT compiler, which compiles the entire bytecode and changes it to native code. This native code will be used directly for repeated method calls, which improve the performance of the system. The JIT Compiler has the following components:

  • Intermediate Code Generator: Generates intermediate code.
  • Code Optimizer: Optimizes the intermediate code for better performance.
  • Target Code Generator: Converts intermediate code to native machine code.
  • Profiler: Finds the hotspots (code that is executed repeatedly)

Garbage Collector

The Garbage Collector (GC) collects and removes unreferenced objects from the heap area. It is the process of reclaiming the runtime unused memory automatically by destroying them. Garbage collection makes Java memory-efficient because it frees space for new objects.

Garbage Collections is done automatically by the JVM at regular intervals and does not need to be handled separately but, It can also be triggered by calling System.gc() with the execution not guaranteed.

It involves two phases:

  • Mark: In this step, the GC identifies the unused objects in memory.
  • Sweep: In this step, the GC removes the objects identified during the previous phase.

The JVM contains 3 different types of garbage collectors:

  • Serial GC: This is the simplest implementation of GC, and is designed for small applications running on single-threaded environments. It uses a single thread for garbage collection. When it runs, it leads to a stop-the-world event where the entire application is paused.
  • Parallel GC: This is the default implementation of GC in the JVM, and is also known as Throughput Collector. It uses multiple threads for garbage collection but still pauses the application when running.
  • Garbage First (G1) GC: G1GC was designed for multi-threaded applications that have a large heap size available (more than 4GB). It partitions the heap into a set of equal size regions and uses multiple threads to scan them. G1GC identifies the regions with the most garbage and performs garbage collection on that region first.
  • Concurrent Mark Sweep (CMS) GC: Deprecated on Java 9 and removed on Java 14.

Java Native Interface (JNI)

JNI acts as a bridge for permitting the supporting packages for other programming languages such as C, C++, and so on. This is especially helpful in cases where you need to write code that is not entirely supported by Java, like some platform-specific features that can only be written in C.

Native Method Libraries

Native Method Libraries are libraries that are written in other programming languages, such as C, C++, and assembly. These libraries are usually present in the form of .dll or .so files. These native libraries can be loaded through JNI.

JVM memory structure

As exposed earlier, JVM manages the memory automatically with the help of the garbage collector. Memory management is the process of the allocation & de-allocation of the objects from a memory.

We have already described the main memory areas on the “Runtime Data Area” but, let’s explore a bit more how they work.

Heap Area

Heap space in Java is used for dynamic memory allocation for Java objects and JRE classes at the runtime. New objects are always created in heap space and the references to these objects are stored in stack memory. These objects have global access and can be accessed from anywhere in the application. This memory model is further broken into smaller parts called generations, these are:

  • Young Generation: This is where all new objects are allocated and aged. A minor Garbage collection occurs when this fills up.
    • Eden space: It is a part of the Young Generation space. When we create an object, the JVM allocates memory from this space.
    • Survivor space: It is also a part of the Young Generation space. Survivor space contains existing objects which have survived the minor GC phases of GC. There a Survivor Space 0 and Survivor Space 1.
  • Old or Tenured Generation: This is where long surviving objects are stored. When objects are stored in the Young Generation, a threshold for the object’s age is set and when that threshold is reached, the object is moved to the old generation.
  • Permanent Generation: This consists of JVM metadata for the runtime classes and application methods.

This area works as follows:

  1. When an object is created, it first allocated to Eden space because this is not that big and gets full quite fast. The garbage collector runs on the Eden space and clears all non-reference object.
  2. When the GC runs, it moves all objects surviving the garbage collecting process into the Survivor space 0. And, if they still survive, object in Survivor Space 0 into Survivor space 1.
  3. If an object survives for X rounds of the garbage collector (X depends on the JVM implementation), it is most likely that it will survive forever, and it gets moved into the Old space.

Metaspace (PermGen)

Metaspace is a new memory space starting from the Java 8 version; it has replaced the older PermGen memory space.

Metaspace is a special heap space separated from the main memory heap. The JVM keeps track of loaded class metadata in the Metaspace. Additionally, the JVM stores all the static content in this memory section. This includes all the static methods, primitive variables, and references to the static objects. It also contains data about bytecode, names, and JIT information.

Class metadata are the runtime representation of java classes within a JVM process, basically any information the JVM needs to work with a Java class. That includes, but is not limited to, runtime representation of data from the JVM class file format.

Metaspace is only released when a GC did run and unload class loaders.

Performance Enhancements

On the Oracle documentation, some performance enhancements can be found. These enhancements are:

  • Compact Strings
  • Tiered Compilation
  • Compressed Ordinary Object Pointer
  • Zero-Based Compressed Ordinary Object Pointers
  • Escape Analysis

For details on the enhancements better check the documentation where there is a solid explanation of these topics.

Tuning the Garbage Collector

Tuning should be the last option we use for increasing the throughput of the application and only when we see a drop in performance because of longer GC causing application timeouts.

Java provides a lot of memory switches that we can use to set the memory sizes and their ratios. Some of the commonly used memory switches are:

-Xms For setting the initial heap size when JVM starts
-Xmx For setting the maximum heap size
-Xmn For setting the size of the young generation (rest is old generation)
-XX:PermGen For setting the initial size of the Permanent Generation Memory
-XX:MaxPermGen For setting the maximum size of Perm Gen
-XX:SurvivorRatio For providing a ratio of Eden space
-XX:NewRatio For providing a ratio of old/new generation sizes. The default value is 2
-XX:+UserSerialGC For enable Serial garbage collector
-XX:+UseParallelGC For enable Parallel garbage collector
-XX:+UseConcmarkSweepGC For enable CMS garbage collector
-XX:+ParallelCMSThreads For enabling CMS Collector as number of threads to use
-XX:+UseG1GC For enable G1 garbage collector
-XX:HeapDumpOnOutOfMemory Pass a parameter to create a heap dump file when this error happens next time.

Tools

After all this explanation and, in addition to all the configurations, a very interesting point is the monitorization of our Java memory. To do this we have multiple tools we can use:

jstat

It is a utility that provides information about the performance and resource consumption of running java applications. We can use the command with the garbage collection option to obtain various information from a java process.

S0C – Current survivor space 0 capacity (KB)
S1C – Current survivor space 1 capacity (KB)
S0U – Survivor space 0 utilization (KB)
S1U – Survivor space 1 utilization (KB)
EC – Current eden space capacity (KB)
EU – Eden space utilization (KB)
OC – Current old space capacity (KB)
OU – Old space utilization (KB)
MC – Metasapce capacity (KB)
MU – Metaspace utilization (KB)
CCSC – Compressed class space capacity (KB)
CCSU – Compressed class space used (KB)
YGC – Number of young generation garbage collector events
YGCT – Young generation garbage collector time
FGC – Number of full GC events
FGCT – Full garbage collector time
GCT – Total garbage collector time

jmap

It is a utility to print the memory-related statistics for a running VM or core file. It is a utility for enhanced diagnostics and reduced performance overhead.

jcmd

It is a utility used to send diagnostic command requests to the JVM, where these requests are useful for control, troubleshoot, and diagnose JVM and Java applications. It must be used on the same machine where the JVM is running, and have the same effective user and group identifiers that were used to launch the JVM.

jhat

It is a utility that provides a convenient browser base easy to use Heap Analysis Tool (HAT). The tool parses a heap dump in binary format (e.g., a heap dump produced by jcmd).

This is slightly different from other tools because when we execute it, it creates a webserver we can access from our browser to read the results.

VisualVM

VisualVM allows us to get detailed information about Java applications while they are running on a JVM and it can be in a local or a remote system also possible to save and capture the data about the JVM software and save data to the local system. VisualVM can do CPU sampling, memory sampling, run garbage collectors, analyze heap errors, take snapshots, and more.

JConsole

It is a graphical monitoring tool to monitor JVM and Java applications both on a local or remote machine. JConsole uses the underlying features of JVM to provide information on the performance and resource consumption of applications running on the Java platform using Java Management Extension (JMX) technology.

Memory Analyzer (MAT)

The Eclipse Memory Analyzer is a fast and feature-rich graphical Java heap analyzer that helps you find memory leaks and reduce memory consumption.

References

JVM Deep Dive