harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tony Wu" <wuyue...@gmail.com>
Subject Re: [classlib][luni][charset]Strange behavior of UnicodeBig
Date Thu, 19 Oct 2006 03:36:37 GMT
Thank you all,
It is not just an issue about name.
The precondition of mapping is that ICU has really supported this
charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1].
Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work around[2]?

[1]http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co

[2]
UTF-16
 Sixteen-bit UCS Transformation Format, byte order identified by an
optional byte-order mark
UnicodeBig
 Sixteen-bit Unicode Transformation Format, big-endian byte order,
with byte-order mark
 UnicodeLittle
 Sixteen-bit Unicode Transformation Format, little-endian byte order,
with byte-order mark

On 10/17/06, Paulex Yang <paulex.yang@gmail.com> wrote:
> Tony Wu wrote:
> > Thank you Andrew,
> > I think I got the point. The j.l.String of RI uses the encoding of IO
> > whereas Charset.forName use another of NIO.
> >
> > And the new problem is shall we follow the spec[1] to support the two
> > suites of charset implemetation? I just have a look and find we does
> > not support some Canonical Name for java.io and java.lang API such as
> > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> There is such a charset name mapping in InputStreamReader, I think we
> have no choice but to support these legacy charset names, you may need
> some refactory work to make these classes use the same mapping data.
> >
> > [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> >
> > On 10/17/06, Andrew Zhang <zhanghuangzhu@gmail.com> wrote:
> >> On 10/17/06, Andrew Zhang <zhanghuangzhu@gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On 10/17/06, Leo Li <liyilei1979@gmail.com> wrote:
> >> > >
> >> > > I think Harmony is more reasonable.
> >> > >
> >> > > As spec says, if  Charset.forName("UnicodeBig") throws
> >> > > .UnsupportedCharsetException then no support for the named
> >> charset is
> >> > > available in this instance of the Java virtual machine. Then how
> >> can we
> >> > > get
> >> > > new String(b, "UnicodeBig") without throwing
> >> UnsupportedCharsetException
> >> > > on
> >> > > the same jvm? The spec for String(byte[] bytes,String
> >> charsetName) also
> >> > > says
> >> > > if the named charset is not supported, UnsupportedCharsetException
> >> > > should be
> >> > > thrown out.
> >> >
> >> >
> >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd better
> >> support such
> >> > mapping in String and follow RI.
> >> >
> >>
> >> You can find the encoding set from spec. [1]
> >>
> >> [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> >>
> >>  On 10/17/06, Tony Wu <wuyuehao@gmail.com> wrote:
> >> > > >
> >> > > > Hi all,
> >> > > > I found this when I tried to debug the failure tests of ant on
> >> > > > harmony. Note the output of testcases below.
> >> > > >
> >> > > > import java.io.UnsupportedEncodingException;
> >> > > > import java.nio.charset.Charset ;
> >> > > > import junit.framework.TestCase;
> >> > > >
> >> > > > public class TestCharset extends TestCase {
> >> > > >    public void test1() throws UnsupportedEncodingException {
> >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> >> > > >        String s = new String(b, "UnicodeBig");
> >> > > >        assertEquals("abc", s);
> >> > > >    }
> >> > > >
> >> > > >    public void test2() {
> >> > > >        Charset.forName("UnicodeBig");
> >> > > >    }
> >> > > > }
> >> > > >
> >> > > > RI:
> >> > > > test1: junit.framework.ComparisonFailure: expected:<abc>
but
> >> was:<>
> >> > > > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
> >> > > >
> >> > > > Harmony:
> >> > > > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> >> > > > test2:
> >> > > > java.nio.charset.UnsupportedCharsetException: The unsupported
> >> charset
> >> > > > name is "UnicodeBig"
> >> > > >
> >> > > > seems RI can recognize the *UnicodeBig* in Constructor of
> >> j.l.String,
> >> > > > whereas Harmony does not support this alias at all.
> >> > > >
> >> > > > Do you have any concern about that?
> >> > > > --
> >> > > > Tony Wu
> >> > > > China Software Development Lab, IBM
> >> > > >
> >> > > >
> >> ---------------------------------------------------------------------
> >> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> >> > > > To unsubscribe, e-mail:
> >> harmony-dev-unsubscribe@incubator.apache.org
> >> > > > For additional commands, e-mail:
> >> harmony-dev-help@incubator.apache.org
> >> > > >
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Leo Li
> >> > > China Software Development Lab, IBM
> >> > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Best regards,
> >> > Andrew Zhang
> >>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Andrew Zhang
> >>
> >>
> >
> >
>
>
> --
> Paulex Yang
> China Software Development Lab
> IBM
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Tony Wu
China Software Development Lab, IBM

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Mime
View raw message