commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel F. Savarese" <...@savarese.org>
Subject Re: offtopic: regexp or ORO?
Date Fri, 09 Jan 2004 04:50:47 GMT

In message <20040107003142.68033.qmail@web60809.mail.yahoo.com>, David Graham w
rites:
>Those benchmarks are in line with some I performed for Validator. 
>Validator uses ORO but when I replaced it with Java 1.4 regexs I got a 2x
>speed improvement.  ORO works well (if not slowly) for Validator.

Why oh why does no one ever donate their benchmark code to help build
a better product? :)  (I know you probably wrote some quick tests, so
I'm making a general plea rather than one directed at you David.)
Seriously, it would provide an incentive to move out of maintenance mode.
If ORO broke backward compatibility and took advantage of J2SE 1.4 features,
I'm pretty sure the Perl stuff could match java.util.regex on average.  But
what's the point ...  Which is really why I'm chiming in on this thread.
Do Java regular expression users see ORO and Regexp mainly as vehicles for
supporting pre-J2SE 1.4 code (and possibly J2ME; both can be
made to work with J2ME with minor code changes)?  Should they
stay on the shelf in maintenance mode or is there any reason
to continue enhancing them?  Even though there are a lot of
directions they can go in, it doesn't seem like anyone has any
itches left to scratch.

To answer the original question.  If you need Perl (including zero-width
negative lookahead assertions), AWK, or glob expressions, use ORO.  If
you need POSIX-like expressions, use Regexp.  If you don't care, then
establish some other criteria to make the decision, such as whichever
you feel is easier to use.  Microbenchmarks like the one at
http://tusker.org/regex/regex_benchmark.html
are not very useful because the performance of regular expression libraries
depends heavily on the patterns and input data used (unless the patterns
and data used are characterstic of what your application will use).  For
example, in that benchmark, ORO beats java.util.regex on the second
pattern when I run it:
------------------------------------------
Regular expression library: java.util.regex.Pattern
RE: usd [+-]?[0-9]+.[0-9][0-9]
  MS    MAX     AVG     MIN     DEV     INPUT
  27    1       0.0027  0       0       'http://www.linux.com/'
  61    1       0.0061  0       0       'http://www.thelinuxshow.com/main.php3'
  114   4       0.0114  0       0       'usd 1234.00'
  132   4       0.0132  0       0       'he said she said he said no'
------------------------------------------
------------------------------------------
Regular expression library: org.apache.oro.text.regex.Perl5Matcher
RE: usd [+-]?[0-9]+.[0-9][0-9]
  MS    MAX     AVG     MIN     DEV     INPUT
  18    1       0.0018  0       0       'http://www.linux.com/'
  35    1       0.0035  0       0       'http://www.thelinuxshow.com/main.php3'
  85    1       0.0087  0       0       'usd 1234.00'
  108   1       0.0116  0       0       'he said she said he said no'
------------------------------------------

The total time (which is what the benchmark uses) for the
java.util.regex is 334 and the second is 256.  If you only
ran that, you might conclude that ORO 1.35X faster than
java.util.regex.  Nonetheless, I have no doubt that java.util.regex
is on average faster on J2SE 1.4 than libraries that predate J2SE 1.4.
And I trust David Graham's assessment with Validator.  I'm just suggesting
that you (Simon) be careful about this benchmark because it uses a very
limited number of patterns and input.  All that said, it would be great
to have a configurable benchmark to test ORO and Regexp in order to
isolate use cases where their performance can be improved.  But does
anybody care anymore?

daniel



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message