harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Zhang" <zhanghuang...@gmail.com>
Subject Re: [classlib][luni][charset]Strange behavior of UnicodeBig
Date Thu, 19 Oct 2006 06:39:32 GMT
On 10/19/06, Tony Wu <wuyuehao@gmail.com> wrote:
>
> Thank you all,
> It is not just an issue about name.
> The precondition of mapping is that ICU has really supported this
> charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1].
> Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work around[2]?


No, I don't think so. The only difference between "UnicodeBig" and
"UTF-16BE" is with/without byte-order mark. So it should be easy to wrap
"UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put 0xFE 0xFF at the
begining of the bytes and then encode the buffer as "UTF-16BE". Do I miss
something?

[1]http://dev.icu-
> project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
>
> [2]
> UTF-16
> Sixteen-bit UCS Transformation Format, byte order identified by an
> optional byte-order mark
> UnicodeBig
> Sixteen-bit Unicode Transformation Format, big-endian byte order,
> with byte-order mark
> UnicodeLittle
> Sixteen-bit Unicode Transformation Format, little-endian byte order,
> with byte-order mark
>
> On 10/17/06, Paulex Yang <paulex.yang@gmail.com> wrote:
> > Tony Wu wrote:
> > > Thank you Andrew,
> > > I think I got the point. The j.l.String of RI uses the encoding of IO
> > > whereas Charset.forName use another of NIO.
> > >
> > > And the new problem is shall we follow the spec[1] to support the two
> > > suites of charset implemetation? I just have a look and find we does
> > > not support some Canonical Name for java.io and java.lang API such as
> > > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > There is such a charset name mapping in InputStreamReader, I think we
> > have no choice but to support these legacy charset names, you may need
> > some refactory work to make these classes use the same mapping data.
> > >
> > > [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > >
> > > On 10/17/06, Andrew Zhang <zhanghuangzhu@gmail.com> wrote:
> > >> On 10/17/06, Andrew Zhang <zhanghuangzhu@gmail.com> wrote:
> > >> >
> > >> >
> > >> >
> > >> > On 10/17/06, Leo Li <liyilei1979@gmail.com> wrote:
> > >> > >
> > >> > > I think Harmony is more reasonable.
> > >> > >
> > >> > > As spec says, if  Charset.forName("UnicodeBig") throws
> > >> > > .UnsupportedCharsetException then no support for the named
> > >> charset is
> > >> > > available in this instance of the Java virtual machine. Then
how
> > >> can we
> > >> > > get
> > >> > > new String(b, "UnicodeBig") without throwing
> > >> UnsupportedCharsetException
> > >> > > on
> > >> > > the same jvm? The spec for String(byte[] bytes,String
> > >> charsetName) also
> > >> > > says
> > >> > > if the named charset is not supported,
> UnsupportedCharsetException
> > >> > > should be
> > >> > > thrown out.
> > >> >
> > >> >
> > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd better
> > >> support such
> > >> > mapping in String and follow RI.
> > >> >
> > >>
> > >> You can find the encoding set from spec. [1]
> > >>
> > >> [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > >>
> > >>  On 10/17/06, Tony Wu <wuyuehao@gmail.com> wrote:
> > >> > > >
> > >> > > > Hi all,
> > >> > > > I found this when I tried to debug the failure tests of
ant on
> > >> > > > harmony. Note the output of testcases below.
> > >> > > >
> > >> > > > import java.io.UnsupportedEncodingException;
> > >> > > > import java.nio.charset.Charset ;
> > >> > > > import junit.framework.TestCase;
> > >> > > >
> > >> > > > public class TestCharset extends TestCase {
> > >> > > >    public void test1() throws UnsupportedEncodingException
{
> > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > >> > > >        String s = new String(b, "UnicodeBig");
> > >> > > >        assertEquals("abc", s);
> > >> > > >    }
> > >> > > >
> > >> > > >    public void test2() {
> > >> > > >        Charset.forName("UnicodeBig");
> > >> > > >    }
> > >> > > > }
> > >> > > >
> > >> > > > RI:
> > >> > > > test1: junit.framework.ComparisonFailure: expected:<abc>
but
> > >> was:<>
> > >> > > > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > >> > > >
> > >> > > > Harmony:
> > >> > > > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > >> > > > test2:
> > >> > > > java.nio.charset.UnsupportedCharsetException: The unsupported
> > >> charset
> > >> > > > name is "UnicodeBig"
> > >> > > >
> > >> > > > seems RI can recognize the *UnicodeBig* in Constructor of
> > >> j.l.String,
> > >> > > > whereas Harmony does not support this alias at all.
> > >> > > >
> > >> > > > Do you have any concern about that?
> > >> > > > --
> > >> > > > Tony Wu
> > >> > > > China Software Development Lab, IBM
> > >> > > >
> > >> > > >
> > >> ---------------------------------------------------------------------
> > >> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > >> > > > To unsubscribe, e-mail:
> > >> harmony-dev-unsubscribe@incubator.apache.org
> > >> > > > For additional commands, e-mail:
> > >> harmony-dev-help@incubator.apache.org
> > >> > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Leo Li
> > >> > > China Software Development Lab, IBM
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Best regards,
> > >> > Andrew Zhang
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >> Andrew Zhang
> > >>
> > >>
> > >
> > >
> >
> >
> > --
> > Paulex Yang
> > China Software Development Lab
> > IBM
> >
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Tony Wu
> China Software Development Lab, IBM
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Best regards,
Andrew Zhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message