harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yang Paulex" <paulex.y...@gmail.com>
Subject Re: [classlib][regex] Performance of regular expression is 30% of RI
Date Tue, 04 Sep 2007 09:57:28 GMT
2007/9/4, Jimmy,Jing Lv <firepure@gmail.com>:
>
> Hi,
>
>     A micro benchmark[1] on regex shows our implementation is much
> slower than RI, it cost Harmony ~23 seconds while RI ~7 seconds to
> complete the test. I use windows XP, drlvm and Harmony M2 build, and
> then replace regex classes with RI's while testing RI.
>
>     Regular expression is very improtant and core classlib for
> Harmony, make it performs well will do benefit to Harmony performance.
> We may find some way to make it faster.
>
>     According to Spec, regex use NFA algorithm[2] which is a little
> different with perl regular expression. As a result, we can not change
> the basic algorithm to DFA to improve performance. However still
> there's alternative algorithm of NFA. I googled open source regular
> experssion implementation, it is said that PCRE[3](Perl Compatible
> Regular Expressions) algorithm is of very high performance and luckily
> it is BSD licensed!
>     However the source is so large and C-written, and a quick glance
> at complie function I find it rather complex,  And java regular
> expression is different from perl's so that a directly wrapping will
> be surely un-workable.


Agreed, I don't think it's a good idea to implement regex in JNI, the
Java-native bridge penalty may be too high for regex.


But we still may read its implementation and
> find out some key point to improve, or some one already has some idea
> on tuning?


It would be interesting to know the hotspot of our  current implementations.


    I'm not a regular expression expert, please correct me if I miss
> something. Any comments/suggestions?
>
> [1]
>     public void test_Benchmark() throws Exception {
>         // warn up phase
>         for (int i = 0; i < 10000; i++) {
>             Pattern p = Pattern.compile
> ("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z");
>             Matcher m = p.matcher("aaaaab");
>             m.matches();
>         }
>
>         // benchmark start
>         long time = System.currentTimeMillis();
>         for (int i = 0; i < 1000000; i++) {
>             Pattern p = Pattern.compile
> ("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z");
>             Matcher m = p.matcher("aaaaab");
>             m.matches();
>         }
>         System.out.println(System.currentTimeMillis() - time);
>     }
>
> [2] file:///C:/Spec/docs/api/java/util/regex/Pattern.html
> [3] http://www.pcre.org/
> --
>
> Best Regards!
>
> Jimmy, Jing Lv
> China Software Development Lab, IBM
>



-- 
Paulex Yang
China Software Development laboratory
IBM

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message