harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Strigun" <vstri...@gmail.com>
Subject Re: [contribution] Contribution of charset encoders/decoders for NIO_CHAR module
Date Mon, 09 Apr 2007 10:33:32 GMT
On 4/9/07, Yang Paulex <paulex.yang@gmail.com> wrote:
> 2007/4/9, Vladimir Strigun <vstrigun@gmail.com>:
> >
> > Hi all!
> >
> > I'm happy to announce one more contribution to harmony on behalf of
> > Intel. Provided implementation of charset encoders/decoders is
> > intended to replace the ICU-based charsets encoding/decoding
> > operations. The code was developed in clean-room environment inside
> > Intel and I'd like you to play with it and include to current Harmony
> > tree.
> >
> > The package could be found there:
> > HARMONY-3593
> >
> > The algorithms for charsets encoding/decoding differs from that of
> > ICU, all charsets are generated from current Harmony or any other
> > implementation of Java and could be properly integrated into current
> > nio_char module. The archive contains source files for 6 charsets:
> > GB18030, US-ASCII, ISO-8859-1, UTF-8, UTF-16, UTF-16BE, UTF-16LE;
> > implementation of CharsetProvider; generator for other Charsets and
> > native part. I've tested the package with more that 90 charsets, and
> > all benchmarks and tests passed with new bundle. Additionally I have
> > significant boost for Dacapo.antlr and Dacapo.xalan benchmarks with
> > current Harmony tree on DRLVM and IBM VM. On DRLVM I have 2.5x boost
> > for antlr and ~5-8x for xalan.
> >
> > The main advantages of the package are the following:
> >   - Code for every charset is generated by CharsetGenerator, thus, if
> > some modification would be necessary we need just correct generator
> > and re-generate all sources.
> >   - We use 2 different encoders and decoders for java and direct
> > buffers. Since most applications use java heap buffers, unlike
> > existing implementation it doesn't produce lots of native calls to
> > perform encoding/decoding operations on the java buffers those
> > significantly improving performance. This is the main reason why we
> > have such a significant boost for Dacapo.
> >   - Charset tables for encoding/decoding are stored in appropriate
> > classes.
> >
> > Since the package contains implementation for 6 charsets only,
> > documentations how to generate and build additional charsets you could
> > find in README file from contributed package.
> >
> > Please do not hesitate to contact me for more details.
> >
> > Thanks,
> > Vladimir.
> >
>
> Good work, Vladimir and team in Intel!
>
> I'm also interested in a pure Java charset conversion provider for Harmony,
> because the frequent JNI invocation in ICU4JNI(current Harmony charset
> provider) may impair the performance when dealing with small chunk of bytes.
> But I noticed that, in this contribution, US_ASCII, ISO_8859_1 and GB18030
> are implemented in native C, just out of interest, any special reason not to
> implemented in Java?

As I wrote ealier, 2 branches of code generated for every
encoder/decoder: java and native one. Native branch used only for
processing native byte buffers. Native branch could be easily removed
by small modification of generators, but performance measurements
shows that it's better to use native decoders/encoders
in case of native buffers.

Thanks.
Vladimir.

> --
> Paulex Yang
> China Software Development laboratory
> IBM
>

Mime
View raw message