harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Zhang" <zhanghuang...@gmail.com>
Subject Re: [classlib][luni]difference between RI and ICU
Date Tue, 12 Sep 2006 07:56:02 GMT
On 9/12/06, Tony Wu <wuyuehao@gmail.com> wrote:
>
> I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
> There is a difference between RI and ICU.
>
> RI spec says,
>
>
> > It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
> > PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
> > '\u2007', '\u202F').
>
> but ICU spec says,
>
> > It is a Unicode space separator (category "Zs"), but is not a no-break
> > space (\u00A0 or \u202F or \uFEFF).
>
> RI excludes U+2007 however ICU excludes U+FEFF
>
> And I looked up the definition of these 4 related characters on
> unicode.org:
>
> > 00A0;NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;NON-BREAKING SPACE;;;;
> > 2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0020;;;;N;;;;;
> > 202F;NARROW NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;;;;;
> > FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;;;;;N;BYTE ORDER MARK;;;;
>
>
> I consider it is a bug of ICU because the U+FEFF is not in category *Zs*
> as
> ICU spec described. And I purposed to report that to ICU team.
> Should I handle the U+2007 by ourselves to follow RI or just document this
> problem in testcase?


I think we could use workaround at first, add "FIXME:" before workaround,
and write corresponding test case.

When ICU team reponses (no matter accepts or rejects), we could make
decision then.

--
> Tony Wu
> China Software Development Lab, IBM
>
>


-- 
Andrew Zhang
China Software Development Lab, IBM

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message