harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Ellison <t.p.elli...@gmail.com>
Subject Re: [classlib][luni] String.toLowerCase/toUpperCase incorrect for supplementary characters (HARMONY-6649)
Date Fri, 24 Sep 2010 02:33:47 GMT
So you'll see that I reverted the new String impl until we fix up the
possible cases in the boot sequence.

I was reviewing the callers of toLowerCase(), and see that
File#hashCode() uses that method (not in boot mind you).

The Java SE 6 spec for that method [1] has been clarified to add the
fact that when lowercasing the pathname on Windows the "Locale is not
taken into account on lowercasing the pathname string."

Thinking out loud -- so how does that work? How can we lowercase a
Unicode string without consideration of a locale?

[1]
http://download.oracle.com/javase/6/docs/api/java/io/File.html#hashCode%28%29

Regards,
Tim


On 23/Sep/2010 12:21, Robert Muir wrote:
> On Wed, Sep 22, 2010 at 10:33 PM, Tim Ellison <t.p.ellison@gmail.com> wrote:
> 
>> On 23/Sep/2010 01:10, Robert Muir (JIRA) wrote:
>>> I thought about this too,
>>>
>>> one concern (not knowing if there are more cases involved) would be
>>> if the input "should" be ascii, but "could" be something else. if
>>> String.toLowerCase had the ascii special-case with a fallback to ICU,
>>> it could fail less gracefully in such a situation if it encountered
>>> non-ascii rather than simply not matching, especially since unit
>>> tests tend to have more coverage for the ascii case...
>>>
>>> ...but this might be theoretical
>> Fail less gracefully than what?  Today, by using String#toLowerCase(),
>> invalid ascii gets past into ICU so will get converted as though it were
>> a valid char encoding, so I don't think it would make anything worse
>> than it is today.
>>
> 
> well, what I meant to say is that the auto-detect idea seems a bit shaky. if
> something wants to do an ascii-only uppercase/lowercase before ICU is
> available, and we know we cannot load ICU yet, then I think the
> toASCIILowerCase is much better than calling String.toLowerCase and saying
> "yeah we know the input is all ascii, it won't load ICU".
> 
> The toASCIILowerCase will never load ICU, doesn't depend on an
> implementation detail of String, and then its explicit in the code what is
> going on.
> 
> 
>> I the the debate is whether to find and fix places in the class library
>> code where we know the input is ascii and change uses of
>> String#toLowercase to use
>> org.apache.harmony.luni.util.Util#toASCIILowerCase() [1]
>>
> 
> +1, I think this is the best solution.
> 
> 

Mime
View raw message