commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörg Schaible <joerg.schai...@gmx.de>
Subject Re: [io] IBM JDK and broken UTF-16 (Related to release 2.5 RC 1)
Date Fri, 11 Dec 2015 13:35:41 GMT
Hi Kristian,

Kristian Rosenvold wrote:

> I've been digging deeply into the IBM JDK 6/7 related breakages on IO RC
> 2.5.
> 
> A lot of them can be explained by different capabilities of XML parsers in
> the different JDKs, and I have come up with a decent heuristic for
> detecting this and ignoring the tests.
> 
> There are also a couple of legacy oddball character sets supported by the
> IBM JDK that simply do not support round-tripping the french string in the
> testcase. (Nerdy side note; Take a look at the 7-bit japanese/chinese
> https://en.wikipedia.org/wiki/ISO/IEC_2022 !). These can just be excluded
> from the testcase.
> 
> But the UTF-16 decoder in IBM JDK 6 and 7 is simply broken when fed
> single-bytes at a time (it works fine with a full byte array input). This
> is bad news for the WriterOutputStream, which is quite fundamentally based
> on outputting single bytes. Where the other problems can be fixed by
> improving the testcase, I really believe the  WriterOutputStream should
> just throw UnsupportedOperationException on IBM JDK6/7 with UTF16.
> 
> WDYT ?

I would not do this by default. Simply add a static initializer, test for 
IBM JDK, test the UTF16-functionality and if it fails, we can then throw UOE 
in the case above. If it succeeds, we might run on a newer JDK where the 
problem is fixed.

Cheers,
Jörg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message