harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Ellison <t.p.elli...@gmail.com>
Subject Re: [classlib] String.toLowerCase/toUpperCase incorrect for supplementary characters (HARMONY-6649)
Date Thu, 16 Sep 2010 13:26:45 GMT
On 16/Sep/2010 14:16, Robert Muir wrote:
> On Thu, Sep 16, 2010 at 9:05 AM, Tim Ellison <t.p.ellison@gmail.com> wrote:
> 
>> On 16/Sep/2010 12:32, Robert Muir (JIRA) wrote:
>>> Does this mean harmony might need these methods for its own internal
>>> use before ICU is available?
>> Yes, String is used early in the bootstrapping, and having dependencies
>> on ICU functionality leads to an initialization circularity.
>>
>> i.e. if I simply implement String#toUpperCase(Locale) as
>> "return UCharacter.toUpperCase(locale, this)"
>> then we fail to boot with
>>
>> java.nio.charset.Charset (initialization failure)
>>     at java/lang/String.defaultCharset (String.java:736)
>>     at java/lang/String.<init> (String.java:232)
>>     at org/apache/harmony/luni/util/Util.toString (Util.java:102)
>>     at java/lang/System.getPropertyList (Native Method)
>>     at java/lang/System.ensureProperties (System.java:546)
>>     at java/lang/System.<clinit> (System.java:102)
>>
>> Likely because we use nio Charset in the String implementation, and that
>>  in turn eventually calls String.toUppercase(), in CharsetProviderImpl
>> lines 113 and 145.
>
> Really I think this is a bug in CharsetProviderImpl.
> Because this calls toUpperCase with the default Locale, if i am on a Turkish
> computer, and i do Charset.forName("us-ascii"),
> its going to uppercase it turkish-style and it won't find the charset. but
> it will work fine most anywhere else.

Yes, I agree (though I don't know what happens when it is uppercased
'properly' in Turkish)

> I think this should be using Locale.ENGLISH here, as its a case-insensitive
> comparison, not intended for display.
> 
> Or, optionally something closer to Unicode's 'simple case folding' could be
> available for situations like this, that simply iterates thru the string and
> does the Locale-insensitive Character.toLowerCase(i) on each character.

Nah, the Charset name is obliged to be a strict subset of characters
[1], so I propose we just deal with the transform directly in
CharsetProviderImpl without asking Character.

[1]
http://download.oracle.com/javase/1.5.0/docs/api/java/nio/charset/Charset.html

Regards,
Tim

Mime
View raw message