harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tony Wu" <wuyue...@gmail.com>
Subject [classlib][text] regression in text module, a non-bug difference?
Date Tue, 19 Feb 2008 16:20:12 GMT
Hi, all

I'm investigating the regression[1] in text module. Actually these 5
failures come down to one reason: the support of traditional Spanish
charactor "ch". Following is my understanding.

My fix for HARMONY-5465 makes the Locale.toString be compatible with
RI. Before my commit, the toString() of the Locale with empty "contry"
field has only one underscore in the output but RI has two. For
instance, new Locale("es","","TRADITIONAL").toString() returns
"es_TRADITIONAL" in Harmony whereas "es__TRADITIONAL" in RI. Something
interesting, ICU makes use of the output of toString() as keyword to
indicate its Locale instance. That is to say, the 5 testcases passes
before because they have not been tested in real traditional Spanish
locale so that the character "ch" was interpreted as two separate
characters "c" and "h". That is why we can set the offset to 1 in our
testcases. After my commit, ICU find the right Spanish locale so that
its behavior is compatible with spec[2].

One thing strange is that I can not get the traditional Spanish locale
in RI. RI behaves the same no mater whether there is a variant
"TRADITIONAL" or not. Spec does not say anything about the
"traditional", but I googled to know that from 1998 the character "ch"
has been cancelled in Spanish. I suppose that RI changed the behavior
of Spanish locale but forgot to modify the spec accordingly.

BTW for the normal Spanish Locale(new Locale("es","ES")), we have the
same behavior with RI. Seems ICU supports the traditional Spanish in
the form of new Locale("es","","TRADITIONAL") but RI does not. Run
testcase below[3] on RI to show the differences.

Is there any expert familiar with Spanish here? Neey your advice.

[1]
http://people.apache.org/~smishura/r628209/Windows_x86/classlib-test/

[2]
spec says,
For example, consider the following in Spanish:

 "ca" -> the first key is key('c') and second key is key('a').
 "cha" -> the first key is key('ch') and second key is key('a').


[3]
        RuleBasedCollator rbColl = (RuleBasedCollator) Collator
                .getInstance(new Locale("es", "", "TRADITIONAL"));
        String text = "cha";
        CollationElementIterator iterator = rbColl
                .getCollationElementIterator(text);
        int keyNum = 0;
        while (iterator.next() != -1) {
            keyNum++;
        }
        System.out.println("RI has " + keyNum + " keys");

        com.ibm.icu.text.RuleBasedCollator r =
(com.ibm.icu.text.RuleBasedCollator) com.ibm.icu.text.Collator
                .getInstance(new Locale("es", "", "TRADITIONAL"));
        com.ibm.icu.text.CollationElementIterator it = r
                .getCollationElementIterator(text);
        keyNum = 0;
        while (it.next() != -1) {
            keyNum++;
        }
        System.out.println("ICU has " + keyNum + " keys");



The output is:
RI has 3 keys
ICU has 2 keys


-- 
Tony Wu
China Software Development Lab, IBM

Mime
View raw message