In the search for the best regular expression (or regex) parser available for Java, I had to satisfy myself by doing some benchmarks. The following table shows the typical results for the following three regular expression matches.
"^(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)" , "usd [+-]?[0-9]+.[0-9][0-9]" and "\\b(\\w+)(\\s+\\1)+\\b".
Regular expression library | Time taken for 10,000 iterations | Parses perl5 regex correctly? |
---|---|---|
org.apache.regexp.* 1.2 | 6059ms | true |
com.stevesoft.pat.Regex 1.5.3 | 6479ms | true |
com.ibm.regex.RegularExpression 1.0.2 | 2494ms | true |
gnu.regexp.RE version 1.1.4 | 36032ms | true |
kmy.regex.util.Regex 0.1.2 | 5157ms | true |
java.util.regex.Pattern 1.4 | 1122ms | true |
jregex.Pattern 1.2_01 | 1432ms | true |
org.apache.oro.text.regex.Perl5Matcher 2.0.6 | 2263ms | true |
RegularExpression.RE 1.0 | 3946ms | false - fails on URL test |
gnu.rex.Rex ??? | ???ms | unknown |
dk.brics.automaton.RegExp1.2 | 511ms | false - fails on URL test |
com.karneim.util.collection.regex.Pattern1.1.1 | 543ms | false - fails on URL test |
Bryan Davis has contributed a method to further breakdown the benchmark details. Feel free to take the current java source for the regtest.java for this update.
This test was run on a PIII-650, with 288meg of RAM using j2sdk1.4.0_01. Please run the test on your own machine, and I will post the details here once I work out a good way of analysing the data (if you know of an easy way to create an easier output format for the tests, please feel free to make the changes, and email the changes to me). The source code requires the jars from all the above regular expression libarys which can be found by click the links in the above table. A simple compile.bat will compile and run the tests for you, given you have all the jars in the current working directory.
If you know of any other java regular expression libraries, please do let me know aswell :)