Java Regular expression library benchmarks

In the search for the best regular expression (or regex) parser available for Java, I had to satisfy myself by doing some benchmarks. The following table shows the typical results for the following four regular expression matches.

"^(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)" , "(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)", "usd [+-]?[0-9]+.[0-9][0-9]" and "\\b(\\w+)(\\s+\\1)+\\b".

It shouldn't matter which broadband connection you use to download this page and attachments, as the content is quite small (and looks nice enough on a mobile phone)!

!NEW!I have rerun the benchmarks, with the latest version of the packages I could find, and tweaked the results. The large string block only gets run 10 times per regular expression, as it takes over 1 second for some libraries

Regular expression libraryTime taken for 10,000 iterations (JDK 1.6.x)JDK 1.4.2_19Parses perl5 syntax regex correctly?
org.apache.regexp.* 1.514372ms25590msfalse - fails on last regex on all strings
com.stevesoft.pat.Regex 1.5.317514ms61702msfalse - fails on last regex on last string
com.ibm.regex.RegularExpression 1.0.27950ms20524msfalse - fails on last regex on last string
kmy.regex.util.Regex 0.1.21317ms641msfalse - fails on last regex on last string
java.util.regex.Pattern 1.46556ms12117msfalse - fails on last regex on last string
jregex.Pattern 1.2_01882ms820mstrue
org.apache.oro.text.regex.Perl5Matcher 2.0.81071ms1387mstrue
RegularExpression.RE 1.11400ms6025msfalse - fails on backreference and URL tests
gnu.rex.Rex 0.011ms13msfalse - fails on backreference and URL tests
dk.brics.automaton.RegExp 1.7-1353ms421msfalse - though if you drop the ^, it works well!
com.karneim.util.collection.regex.Pattern 1.1.1321ms402msfalse - fails on URL test
monq.jfa.Regexp 1.1.1327ms1.5+ onlyfalse - fails on backreference test
com.ibm.icu.text.UnicodeSet (ICU4J) 4.4.136ms1.5+ onlyfalse - fails on backreference test
org.apache.xerces.impl.xpath.regex.RegularExpression 2.9.011526ms26755mstrue
gnu.regexp.RE version 1.1.42084ms19435mstrue
A more detailed breakdown for the above benchmarks!

Anders Møller gave some hints for improving the speed of the dk.brics.automaton.RegExp package, even faster!!

Bryan Davis has contributed a method to further breakdown the benchmark details. Feel free to take the current java source for the regtest.java for this update.

This test was run on a Core2Quad-2.33ghz (Q8200) with 8gig of RAM using j2sdk1.6.0_20 (and j2sdk1.4.2_19 for comparison). Please run the test on your own machine, and I will post the details here once I work out a good way of analysing the data (if you know of an easy way to create an easier output format for the tests, please feel free to make the changes, and email the changes to me). The source code requires the jars from all the above regular expression libarys which can be found by click the links in the above table. A simple compile.bat/run.sh will compile and run the tests for you, given you have all the jars in the current working directory.

If you know of any other java regular expression libraries, please do let me know aswell :)


This webpage is Copyright (c) 2002,2005,2010 Damien Mascord tusker@tusker.org