harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Deakin <oliver.dea...@googlemail.com>
Subject Re: [classlib][icu] Bringing ICU level up to 3.8
Date Fri, 19 Oct 2007 10:12:22 GMT
Thanks for these results Alexei - it's interesting to see that icu4j 
does not lag far behind icu4jni even on such a large conversion.

I have discovered that ICU4J 3.8 does not support ISO-2022 charsets 
currently [1], which causes one test 
(tests.api.java.io.InputStreamReaderTest.test_read()) to fail. This 
would only be a temporary issue and I do not see it as a major issue. 
However, I am not familiar with this charset and as such cannot fully 
gauge the impact of it's absence on the community. Would this lack of 
support be an issue?

If the short-term lack of ISO-2022 support is not a problem, then Id 
like to move ahead to completely use icu4j 3.8 and remove the icu4jni 
and icu4c dependencies in classlib. I will give it a couple of days and, 
if there are no objections, I will go ahead and apply the changes required.

Regards,
Oliver

[1] http://bugs.icu-project.org/trac/ticket/5791

Alexei Zakharov wrote:
> Hi Oliver,
>
> I've created a small benchmark too. It takes Leo Tolstoy's "War and
> Peace" Book One as input and converts it from Russian CP-1251 to
> UTF-16 (10 times) and back (also 10 times). You may find the
> benchmark's source code and a build file at [1].  The first difference
> from your benchmark is the language & encoding - Russian in my case.
> The second difference is the set of tested VMs - I've run the
> benchmark on RI, J9 and DLRVM.
>
> You may find results below. BTW the results shows that in this
> particular test our internal providers (from
> org.apache.harmony.niochar.charset package) are faster than both
> versions of ICU. Another interesting fact is terrible ICU performance
> on DLRVM. However, on J9 it works rather fast. And this is something
> that should be fixed IMO (bad performance on DRLVM I mean). And
> finally, yes, ICU4JNI is a little bit faster than ICU4J in this test.
> However, "War and Peace" is a rather big book (paper version of the
> first part contains about 400 pages, if repeated 10 times = 4000
> pages), but difference in numbers is not so big.
>
> [1] http://people.apache.org/~ayza/icu_experiments/
>
>
> RI
> ---
> Built-in
> <sun.nio.cs.MS1251$Decoder> Decoding time: 571 millis
> <sun.nio.cs.MS1251$Encoder> Encoding time: 351 millis
>
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 430 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 551 millis
>
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 401 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 540 millis
>
> J9
> ---
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 231 millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 430 millis
>
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 781 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 620 millis
>
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 561 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 371 millis
>
>
> DRLVM
> ---
> Built-in
> <org.apache.harmony.niochar.charset.CP_1251$Decoder> Decoding time: 351 millis
> <org.apache.harmony.niochar.charset.CP_1251$Encoder> Encoding time: 540 millis
>
> ICU4j
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6660 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 1071 millis
>
> ICU4JNI
> <com.ibm.icu.charset.CharsetMBCS$CharsetDecoderMBCS> Decoding time: 6179 millis
> <com.ibm.icu.charset.CharsetMBCS$CharsetEncoderMBCS> Encoding time: 451 millis
>
> With Best Regards,
> Alexei
>
> 2007/10/11, Oliver Deakin <oliver.deakin@googlemail.com>:
>   
>> Tony Wu wrote:
>>     
>>> On 10/8/07, Oliver Deakin <oliver.deakin@googlemail.com> wrote:
>>>       
>>>> Are there any particular
>>>> benchmarks you had in mind for this?
>>>>
>>>>
>>>>         
>>> ya, there is a micro benchmark on HARMONY-3709
>>>
>>>
>>>       
>> <SNIP!>
>>
>> I have run the micro benchmark on Harmony with it's current ICU
>> configuration (icu4jni 3.4.4) and on Harmony with pure icu4j 3.8. The
>> results are pretty much as expected - for small jobs icu4j is
>> significantly faster, for large jobs icu4jni comes out on top (full
>> results at the end of this email). It seems that performance-wise there
>> are benefits on both sides depending on the work we are doing.
>>
>> Regards,
>> Oliver
>>     
>
>   

-- 
Oliver Deakin
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Mime
View raw message