harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: [classlib] String.toLowerCase/toUpperCase incorrect for supplementary characters (HARMONY-6649)
Date Thu, 16 Sep 2010 13:16:10 GMT
On Thu, Sep 16, 2010 at 9:05 AM, Tim Ellison <t.p.ellison@gmail.com> wrote:

> On 16/Sep/2010 12:32, Robert Muir (JIRA) wrote:
> > Does this mean harmony might need these methods for its own internal
> > use before ICU is available?
>
> Yes, String is used early in the bootstrapping, and having dependencies
> on ICU functionality leads to an initialization circularity.
>
> i.e. if I simply implement String#toUpperCase(Locale) as
> "return UCharacter.toUpperCase(locale, this)"
> then we fail to boot with
>
> java.nio.charset.Charset (initialization failure)
>     at java/lang/String.defaultCharset (String.java:736)
>     at java/lang/String.<init> (String.java:232)
>     at org/apache/harmony/luni/util/Util.toString (Util.java:102)
>     at java/lang/System.getPropertyList (Native Method)
>     at java/lang/System.ensureProperties (System.java:546)
>     at java/lang/System.<clinit> (System.java:102)
>
> Likely because we use nio Charset in the String implementation, and that
>  in turn eventually calls String.toUppercase(), in CharsetProviderImpl
> lines 113 and 145.
>
>
Really I think this is a bug in CharsetProviderImpl.
Because this calls toUpperCase with the default Locale, if i am on a Turkish
computer, and i do Charset.forName("us-ascii"),
its going to uppercase it turkish-style and it won't find the charset. but
it will work fine most anywhere else.

I think this should be using Locale.ENGLISH here, as its a case-insensitive
comparison, not intended for display.

Or, optionally something closer to Unicode's 'simple case folding' could be
available for situations like this, that simply iterates thru the string and
does the Locale-insensitive Character.toLowerCase(i) on each character.


-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message