Groups in Regular Expressions

We cannot discuss the power of regular expression, an amazing tool with unlimited (usually our imagination) capabilities to progress strings. Every developer should, at least, have a basic understanding of them. But, lately, I have realized not a lot of people knows the possibility of creating and labeling “groups”. Groups allow us to access in a very  simple and clear way to the expressions matching our regular expression.

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses “(” and “)” metacharacters. Any subpattern inside a pair of parentheses will be captured as a group. In practice, this can be used to extract information like phone numbers or emails from all sorts of data.

Here, I am just going to write a little example to show the basic behavior and, I leave to all of you to find the appropriate use cases. In the example, I am going to extract some different hashes for further processing.

public static void main(String[] args) {
    final Pattern HASH_PATTERN = Pattern.compile("^(?<md5>[0-9a-f]{32})(?:/)(?<sha1>[0-9a-f]{40})(?:/)(?<sha256>[0-9a-f]{64})$");
    final Matcher matcher = HASH_PATTERN.matcher("ce114e4501d2f4e2dcea3e17b546f339/a54d88e06612d820bc3be72877c74f257b561b19/c7be1ed902fb8dd4d48997c6452f5d7e509fbcdbe2808b16bcf4edce4c07d14e");

    if (matcher.matches()) {         
        final String md5 = matcher.group("md5");         
        final String sha1 = matcher.group("sha1");         
        final String sha256 = matcher.group("sha256");         
        ...
    }     
    ... 
}

As you can see, the example is pretty simple, it takes one line that contains a string and extracts the MD5, SHA1 and SHA256 hashes. We can see the code is easy to read and understand because everything is using human readable labels not just numbers to access the groups and there is no need to process the string with split operations or similars.

The syntax for the groups is:

(?<name>X)– X, as a named-capturing group

(?:X) – X, as a non-capturing group

With this, we can easily make our code easier to read and maintain when we are extracting information or doing some text processing.

For further information, check the Java documentation: Pattern (Java SE 10 & JDK 10)

Groups in Regular Expressions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.