harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HARMONY-6650) Character.getType(int) inconsistent with Character.getType(char): uses different version of unicode
Date Wed, 22 Sep 2010 04:14:33 GMT

    [ https://issues.apache.org/jira/browse/HARMONY-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913382#action_12913382
] 

Robert Muir commented on HARMONY-6650:
--------------------------------------

I don't mind working up a patch for this approach.

I have one last question though, that I've been trying to figure out related to this issue.

Harmony is using icu 4.4.x (in other places too I assume?), which means things like 
these properties come from Unicode 5.2. But if I look here:

http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#unicode-version

The version is Unicode 4. Is this something that is not actually in the spec (but an impl
detail)?
Or is it a compatibility issue already that harmony uses this higher version of Unicode?

If its a problem, i certainly don't have ideas on how to address it... but it would cause
lots of problems up the stack like different rendering behavior and other issues.

For reference here is a diff between 4.0 and 5.2, to show all the differences in the UCD:
http://people.apache.org/~rmuir/unicodeDiff2.txt


> Character.getType(int) inconsistent with Character.getType(char): uses different version
of unicode
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HARMONY-6650
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6650
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>            Reporter: Robert Muir
>
> While looking at Character, i noticed the code looked very different for 'int' than 'char'
here.
> in particular the int method defers to ICU, but the char method binsearches its own table.
> and the comment for that table is:
> // Unicode 3.0.1 (same as Unicode 3.0.0)
> private static final char[] typeValues ....
> But Unicode 3 is the wrong version for java5/6
> So, i tried a character whose type changed from 3.0 to 4.0, just to see.
> For example, compare these two results:
> Character.getType('\u17B5') = 8 (combining mark)
> Character.getType((int) '\u17B5') = 16 (format)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message