harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From enh <...@google.com>
Subject Re: A question about android regex implementation
Date Wed, 07 Jul 2010 18:07:39 GMT
On Wed, Jul 7, 2010 at 09:55, Jesse Wilson <jessewilson@google.com> wrote:
> On Mon, Jul 5, 2010 at 8:02 PM, Deven You <devyoudw@gmail.com> wrote:
> > I have written some simple benchmarks for harmony regex and find the
> > performance of harmony is poor compared to RI. For example, Mathcer.find()
> > only reach 60% of that of RI.

i'd be interested to see your benchmark code. our regular expression
benchmarks actually focus on replacing use of Pattern/Matcher with
simpler fast paths where the full complexity isn't needed. for
example, most String.split calls don't actually use regular
expressions. i'm not sure how much of this work made it into froyo,
and it's certainly still a work in progress.

one thing we want to do is use http://code.google.com/p/caliper/
benchmarks to track the relative performance of Java implementations
of the stuff that's currently native code (but doesn't inherently need
to be). we don't have anything like that for regular expressions yet.
(our tool for conveniently running benchmarks is
http://code.google.com/p/vogar/ and you can find some of our actual
benchmarks at http://code.google.com/p/dalvik/.)

> > I heard Android use icu4jni re-implement this
> > module. Since icu4jni use native code I think it may has higher performance
> > than harmony. I am trying to use icu4jni as the back-end of harmony regex
> > but find icu4jni has no functions related to regex operations.

although, as jesse said, regular expressions are a special case where
there was no original icu4jni implementation, icu4jni is a dead
project (http://site.icu-project.org/#TOC-ICU4JNI). Android used the
icu4jni code, but it's been gradually rewritten since then. the
process is incomplete, and the extent of the rewrite varies from class
to class, but you're probably better off taking code from Android
rather than icu4jni.

> > I know there are some android guys in our community. So can anyone tell me
> > some detail info for android's regex, like if it re-implement the regex
> > logic using native code by android itself rather than icu4jni and really get
> > higher performance compared to harmony regex? Thanks a lot!
>
> We actually use icu4c's pattern, which we access via JNI. Documentation is
> here:
>  http://icu-project.org/apiref/icu4c/classRegexPattern.html
>
> Our regex-java integration code is Apache-licensed and available in our Git
> repository here:
>
> http://android.git.kernel.org/?p=platform/libcore.git;a=tree;f=regex/src/main/java/java/util/regex;h=f9207d043a24d3dc034bf2cc3a5fdd57115692c3;hb=master

actually, in shipping versions of Android we use the C API rather than
the C++ API. see
http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=icu/src/main/native/NativeRegEx.cpp;h=7b3cafc0779ef7f3ed69a32128af7f96e3498922;hb=master
for the gory details.

i've recently rewritten this to use the C++ APIs, and it's much
cleaner for it. if you look at the current code you'll see that we
copy the input to the native heap, because icu4c < 4.6 can't cope with
its input moving about between calls (and a VM can't necessarily
guarantee the Java char[] doesn't move around).

in future, we'll use a new icu4c API to get the best of both worlds:

http://sourceforge.net/mailarchive/forum.php?thread_name=AANLkTim7czTaFNEBQm8tO3LhdDingvS3pshpW79XBIhu@mail.gmail.com&forum_name=icu-design

if you're shipping your own copy of icu (rather than using some system
one you have no control over), you might want to backport that change
(which has been submitted).

one final thing to be aware of is that icu4c's regular expression
implementation isn't 100% compatible with the RI. most obviously,
there's no support for CANON_EQ, but there are a variety of more
subtle incompatibilities too.

--
Elliott Hughes - http://who/enh - http://jessies.org/~enh/

Mime
View raw message