harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Strigun" <vstri...@gmail.com>
Subject [contribution] Contribution of charset encoders/decoders for NIO_CHAR module
Date Mon, 09 Apr 2007 08:14:10 GMT
Hi all!

I'm happy to announce one more contribution to harmony on behalf of
Intel. Provided implementation of charset encoders/decoders is
intended to replace the ICU-based charsets encoding/decoding
operations. The code was developed in clean-room environment inside
Intel and I'd like you to play with it and include to current Harmony
tree.

The package could be found there:
HARMONY-3593

The algorithms for charsets encoding/decoding differs from that of
ICU, all charsets are generated from current Harmony or any other
implementation of Java and could be properly integrated into current
nio_char module. The archive contains source files for 6 charsets:
GB18030, US-ASCII, ISO-8859-1, UTF-8, UTF-16, UTF-16BE, UTF-16LE;
implementation of CharsetProvider; generator for other Charsets and
native part. I've tested the package with more that 90 charsets, and
all benchmarks and tests passed with new bundle. Additionally I have
significant boost for Dacapo.antlr and Dacapo.xalan benchmarks with
current Harmony tree on DRLVM and IBM VM. On DRLVM I have 2.5x boost
for antlr and ~5-8x for xalan.

The main advantages of the package are the following:
  - Code for every charset is generated by CharsetGenerator, thus, if
some modification would be necessary we need just correct generator
and re-generate all sources.
  - We use 2 different encoders and decoders for java and direct
buffers. Since most applications use java heap buffers, unlike
existing implementation it doesn't produce lots of native calls to
perform encoding/decoding operations on the java buffers those
significantly improving performance. This is the main reason why we
have such a significant boost for Dacapo.
  - Charset tables for encoding/decoding are stored in appropriate classes.

Since the package contains implementation for 6 charsets only,
documentations how to generate and build additional charsets you could
find in README file from contributed package.

Please do not hesitate to contact me for more details.

Thanks,
Vladimir.

Mime
View raw message