In the search for the best regular expression (or regex) parser available for Java, I had to satisfy myself by doing some benchmarks. The following table shows the typical results for the following four regular expression matches.
"^(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)" , "(([^:]+)://)?([^:/]+)(:([0-9]+))?(/.*)", "usd [+-]?[0-9]+.[0-9][0-9]" and "\\b(\\w+)(\\s+\\1)+\\b".
It shouldn't matter which broadband connection you use to download this page and attachments, as the content is quite small (and looks nice enough on a mobile phone)!
!NEW!I have rerun the benchmarks, with the latest version of the packages I could find, and tweaked the results. The large string block only gets run 10 times per regular expression, as it takes over 1 second for some libraries
|Regular expression library||Time taken for 10,000 iterations (JDK 1.6.x)||JDK 1.4.2_19||Parses perl5 syntax regex correctly?|
|org.apache.regexp.* 1.5||14372ms||25590ms||false - fails on last regex on all strings|
|com.stevesoft.pat.Regex 1.5.3||17514ms||61702ms||false - fails on last regex on last string|
|com.ibm.regex.RegularExpression 1.0.2||7950ms||20524ms||false - fails on last regex on last string|
|kmy.regex.util.Regex 0.1.2||1317ms||641ms||false - fails on last regex on last string|
|java.util.regex.Pattern 1.4||6556ms||12117ms||false - fails on last regex on last string|
|RegularExpression.RE 1.1||1400ms||6025ms||false - fails on backreference and URL tests|
|gnu.rex.Rex 0.0||11ms||13ms||false - fails on backreference and URL tests|
|dk.brics.automaton.RegExp 1.7-1||353ms||421ms||false - though if you drop the ^, it works well!|
|com.karneim.util.collection.regex.Pattern 1.1.1||321ms||402ms||false - fails on URL test|
|monq.jfa.Regexp 1.1.1||327ms||1.5+ only||false - fails on backreference test|
|com.ibm.icu.text.UnicodeSet (ICU4J) 4.4.1||36ms||1.5+ only||false - fails on backreference test|
|gnu.regexp.RE version 1.1.4||2084ms||19435ms||true|
Anders Møller gave some hints for improving the speed of the dk.brics.automaton.RegExp package, even faster!!
Bryan Davis has contributed a method to further breakdown the benchmark details. Feel free to take the current java source for the regtest.java for this update.
This test was run on a Core2Quad-2.33ghz (Q8200) with 8gig of RAM using j2sdk1.6.0_20 (and j2sdk1.4.2_19 for comparison). Please run the test on your own machine, and I will post the details here once I work out a good way of analysing the data (if you know of an easy way to create an easier output format for the tests, please feel free to make the changes, and email the changes to me). The source code requires the jars from all the above regular expression libarys which can be found by click the links in the above table. A simple compile.bat/run.sh will compile and run the tests for you, given you have all the jars in the current working directory.
If you know of any other java regular expression libraries, please do let me know aswell :)