harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy,Jing Lv" <firep...@gmail.com>
Subject [classlib][regex] Performance of regular expression is 30% of RI
Date Tue, 04 Sep 2007 09:50:06 GMT
Hi,

    A micro benchmark[1] on regex shows our implementation is much
slower than RI, it cost Harmony ~23 seconds while RI ~7 seconds to
complete the test. I use windows XP, drlvm and Harmony M2 build, and
then replace regex classes with RI's while testing RI.

    Regular expression is very improtant and core classlib for
Harmony, make it performs well will do benefit to Harmony performance.
We may find some way to make it faster.

    According to Spec, regex use NFA algorithm[2] which is a little
different with perl regular expression. As a result, we can not change
the basic algorithm to DFA to improve performance. However still
there's alternative algorithm of NFA. I googled open source regular
experssion implementation, it is said that PCRE[3](Perl Compatible
Regular Expressions) algorithm is of very high performance and luckily
it is BSD licensed!
    However the source is so large and C-written, and a quick glance
at complie function I find it rather complex,  And java regular
expression is different from perl's so that a directly wrapping will
be surely un-workable. But we still may read its implementation and
find out some key point to improve, or some one already has some idea
on tuning?

    I'm not a regular expression expert, please correct me if I miss
something. Any comments/suggestions?

[1]
    public void test_Benchmark() throws Exception {
        // warn up phase
        for (int i = 0; i < 10000; i++) {
            Pattern p = Pattern.compile("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z");
            Matcher m = p.matcher("aaaaab");
            m.matches();
        }

        // benchmark start
        long time = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            Pattern p = Pattern.compile("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z");
            Matcher m = p.matcher("aaaaab");
            m.matches();
        }
        System.out.println(System.currentTimeMillis() - time);
    }

[2] file:///C:/Spec/docs/api/java/util/regex/Pattern.html
[3] http://www.pcre.org/
-- 

Best Regards!

Jimmy, Jing Lv
China Software Development Lab, IBM

Mime
View raw message