harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Spark Shen <smallsmallor...@gmail.com>
Subject Re: [classlib][luni]difference between RI and ICU
Date Tue, 12 Sep 2006 07:03:03 GMT
Robert Hu 写道:
> Tony Wu 写道:
>> I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
>> There is a difference between RI and ICU.
>> RI spec says,
>>> It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
>>> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
>>> '\u2007', '\u202F').
> Anyway, spec is our first rule to follow.
Information from unicode.org is also spec. unicode.org is more official. 
Since RI follows
unicode.org, we should also follow RI, in turn follows unicode.org

>> but ICU spec says,
>>> It is a Unicode space separator (category "Zs"), but is not a no-break
>>> space (\u00A0 or \u202F or \uFEFF).
>> RI excludes U+2007 however ICU excludes U+FEFF
>> And I looked up the definition of these 4 related characters on 
>> unicode.org:
>>> 00A0;NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;NON-BREAKING SPACE;;;;
>>> 2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0020;;;;N;;;;;
>>> 202F;NARROW NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;;;;;
> So cool... :-)
>> I consider it is a bug of ICU because the U+FEFF is not in category 
>> *Zs* as
>> ICU spec described. And I purposed to report that to ICU team.
>> Should I handle the U+2007 by ourselves to follow RI or just document 
>> this
>> problem in testcase?
> IMO, it's natural to follow RI, and the challenge is to fix it 
> gracefully with ICU implementation.

Spark Shen
China Software Development Lab, IBM

Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

View raw message