harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Leo Li" <liyilei1...@gmail.com>
Subject Re: [classlib][regex] Performance of regular expression is 30% of RI
Date Tue, 04 Sep 2007 10:07:46 GMT
On 9/4/07, Yang Paulex <paulex.yang@gmail.com> wrote:
> 2007/9/4, Jimmy,Jing Lv <firepure@gmail.com>:
> >
> > Hi,
> >
> >     A micro benchmark[1] on regex shows our implementation is much
> > slower than RI, it cost Harmony ~23 seconds while RI ~7 seconds to
> > complete the test. I use windows XP, drlvm and Harmony M2 build, and
> > then replace regex classes with RI's while testing RI.
> >
> >     Regular expression is very improtant and core classlib for
> > Harmony, make it performs well will do benefit to Harmony performance.
> > We may find some way to make it faster.
> >
> >     According to Spec, regex use NFA algorithm[2] which is a little
> > different with perl regular expression. As a result, we can not change
> > the basic algorithm to DFA to improve performance. However still
> > there's alternative algorithm of NFA. I googled open source regular
> > experssion implementation, it is said that PCRE[3](Perl Compatible
> > Regular Expressions) algorithm is of very high performance and luckily
> > it is BSD licensed!
> >     However the source is so large and C-written, and a quick glance
> > at complie function I find it rather complex,  And java regular
> > expression is different from perl's so that a directly wrapping will
> > be surely un-workable.
>
>
> Agreed, I don't think it's a good idea to implement regex in JNI, the
> Java-native bridge penalty may be too high for regex.
>
>
> But we still may read its implementation and
> > find out some key point to improve, or some one already has some idea
> > on tuning?
>
>
> It would be interesting to know the hotspot of our  current implementations.

   The interesting thing is that even Pattern.compile("a"), such an
simple pattern, cost more than 2 times on harmony than on RI. And I
have traced the steps of the case, there is no loop, recursive or so
on. Seems it is just we do more than RI.
   I am trying to make further study by profiling.
>
>
>    I'm not a regular expression expert, please correct me if I miss
> > something. Any comments/suggestions?
> >
> > [1]
> >     public void test_Benchmark() throws Exception {
> >         // warn up phase
> >         for (int i = 0; i < 10000; i++) {
> >             Pattern p = Pattern.compile
> > ("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z");
> >             Matcher m = p.matcher("aaaaab");
> >             m.matches();
> >         }
> >
> >         // benchmark start
> >         long time = System.currentTimeMillis();
> >         for (int i = 0; i < 1000000; i++) {
> >             Pattern p = Pattern.compile
> > ("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z");
> >             Matcher m = p.matcher("aaaaab");
> >             m.matches();
> >         }
> >         System.out.println(System.currentTimeMillis() - time);
> >     }
> >
> > [2] file:///C:/Spec/docs/api/java/util/regex/Pattern.html
> > [3] http://www.pcre.org/
> > --
> >
> > Best Regards!
> >
> > Jimmy, Jing Lv
> > China Software Development Lab, IBM
> >
>
>
>
> --
> Paulex Yang
> China Software Development laboratory
> IBM
>


-- 
Leo Li
China Software Development Lab, IBM

Mime
View raw message