harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HARMONY-6649) String.toLowerCase/toUpperCase incorrect for supplementary characters
Date Thu, 16 Sep 2010 11:32:33 GMT

    [ https://issues.apache.org/jira/browse/HARMONY-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910101#action_12910101
] 

Robert Muir commented on HARMONY-6649:
--------------------------------------

Does this mean harmony might need these methods for its own internal use before ICU is available?

When doing the toLowerCase(Locale)/toUpperCase(Locale), perhaps String.java could do:

if (locale is not Turkish or Azeri or Lithuanian)
  while (ch < 0x7f)
     ( just do optimized fast subtraction/addition )
...
// bail out completely and invoke 'UCharacter.xxx'

this might be good for performance reasons? And harmony itself, if it uses this method at
this point, is likely using Locale.ENGLISH or similar for consistent behavior (filenames,
etc) ? Sorry I'm not too familiar with the codebase so I'm not sure if it would work. But
it might speed up 'typical' lowercasing in any case, and as far as worst-case 2x for the "special
casing": i find the "special" casing is going to be slow anyway: e.g. the Greek sigma example
needs to calculate word boundaries!

the Turkish/Azeri case is trickier than the existing code, I think it should use UCharacter.XXX
too.
The reason is it has to be able to handle the case from SpecialCasing where the 'dotted I'
is written in decomposed form (e.g. I + COMBINING DOT ABOVE)


> String.toLowerCase/toUpperCase incorrect for supplementary characters
> ---------------------------------------------------------------------
>
>                 Key: HARMONY-6649
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6649
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M15
>            Reporter: Robert Muir
>
> Simple testcase:
> {code}
>     assertEquals("\uD801\uDC44", "\uD801\uDC1C".toLowerCase());
> {code}
> Looking at modules/luni/src/main/java/java/lang/String.java, the problem is these methods
iterate code units (char) not codepoints (int),
> and use Character.toLowerCase(char) and Character.toUpperCase(char), instead of Character.toLowerCase(int),
and Character.toUpperCase(int)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message