harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Zhang" <zhanghuang...@gmail.com>
Subject Re: [classlib][luni][charset]Strange behavior of UnicodeBig
Date Tue, 07 Nov 2006 10:47:27 GMT
On 11/7/06, Tony Wu <wuyuehao@gmail.com> wrote:
>
> Different with RI, our io/lang use the same charsets
> implementation(ICU) as nio. You know, it is not recommend to modify
> ICU's code. To fix this problem under the precondition I mentioned, I
> have to write a BOM before every encoding operation and handle BOM
> before every decoding, It will obviously broke the structure of our
> existing io/lang implementation.
> So, I think supplying a harmony SPI is easier and more clear.


Make sense. :)

On 11/7/06, Andrew Zhang <zhanghuangzhu@gmail.com> wrote:
> > On 11/6/06, Tony Wu <wuyuehao@gmail.com> wrote:
> > >
> > > A bad news, ICU team refused to support UnicodeBig because it is not
> > > available in nio.
> > >
> > > A good news is that I realize there is a smooth way to support these
> > > charsets. I tried to implement a SPI to accept the name "UnicodeBig"
> > > and it worked. We could support any other charsets and fix the bug
> > > which ICU team hesitated to do this way.  I think it also brings us
> > > the extensibility, do you have any concern about implementing a
> > > harmony SPI? I'll go on if no one objects.
> >
> >
> > Hey Tony, if we only consider io/lang to support UnicodeBig, will the
> thing
> > be simpler?
> >
> > On 10/19/06, Andrew Zhang <zhanghuangzhu@gmail.com> wrote:
> > > > On 10/19/06, Tony Wu <wuyuehao@gmail.com> wrote:
> > > > >
> > > > > I think to support UnicodeBig in nio is not a bug but a feature.
> And
> > > > > the key point is how can I get UnicodeBig supportted in IO/Lang?
> > > >
> > > >
> > > > If ICU/NIO supports "UnicodeBig", wouldn't IO/LANG support
> > > "UnicodeBig"  as
> > > > well?
> > > >
> > > > On 10/19/06, Andrew Zhang <zhanghuangzhu@gmail.com> wrote:
> > > > > > On 10/19/06, Tony Wu <wuyuehao@gmail.com> wrote:
> > > > > > >
> > > > > > > The implemetion is from ICU, so, I think we'd better not
to
> wrap
> > > it by
> > > > > > > ourselves. I'll post to ICU mailing list and ask if they
can
> help
> > > to
> > > > > > > supply these legacy charsets.
> > > > > >
> > > > > >
> > > > > > Hey Tony, please keep in mind that following code[1] should
> print
> > > false
> > > > > and
> > > > > > throw an UnsupportedCharsetException. If ICU provides
> "UnicodeBig"
> > > > > support,
> > > > > > does it mean harmony nio also support "UnicodeBig"?
> > > > > >
> > > > > > [1]
> > > > > > System.out.println(Charset.isSupported("UnicodeBig"));
> > > > > > Charset.forName("UncodeBig");
> > > > > >
> > > > > > On 10/19/06, Andrew Zhang <zhanghuangzhu@gmail.com> wrote:
> > > > > > > > On 10/19/06, Tony Wu <wuyuehao@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Thank you all,
> > > > > > > > > It is not just an issue about name.
> > > > > > > > > The precondition of mapping is that ICU has really
> supported
> > > this
> > > > > > > > > charset. AFAIK UnicodeBig is not implemented
by ICU, refer
> to
> > > [1].
> > > > > > > > > Shall we map the UnicodeBit&UnicodeLittle
to UTF-16 as
> work
> > > > > around[2]?
> > > > > > > >
> > > > > > > >
> > > > > > > > No, I don't think so. The only difference between
> "UnicodeBig"
> > > and
> > > > > > > > "UTF-16BE" is with/without byte-order mark. So it
should be
> easy
> > > to
> > > > > wrap
> > > > > > > > "UTF-16BE"  as "UnicodeBig" for java.io/java.lang.
Just put
> 0xFE
> > > > > 0xFF at
> > > > > > > the
> > > > > > > > begining of the bytes and then encode the buffer as
> "UTF-16BE".
> > > Do I
> > > > > > > miss
> > > > > > > > something?
> > > > > > > >
> > > > > > > > [1]http://dev.icu-
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
> > > > > > > > >
> > > > > > > > > [2]
> > > > > > > > > UTF-16
> > > > > > > > > Sixteen-bit UCS Transformation Format, byte order
> identified
> > > by an
> > > > > > > > > optional byte-order mark
> > > > > > > > > UnicodeBig
> > > > > > > > > Sixteen-bit Unicode Transformation Format, big-endian
byte
> > > order,
> > > > > > > > > with byte-order mark
> > > > > > > > > UnicodeLittle
> > > > > > > > > Sixteen-bit Unicode Transformation Format, little-endian
> byte
> > > > > order,
> > > > > > > > > with byte-order mark
> > > > > > > > >
> > > > > > > > > On 10/17/06, Paulex Yang <paulex.yang@gmail.com>
wrote:
> > > > > > > > > > Tony Wu wrote:
> > > > > > > > > > > Thank you Andrew,
> > > > > > > > > > > I think I got the point. The j.l.String
of RI uses the
> > > > > encoding of
> > > > > > > IO
> > > > > > > > > > > whereas Charset.forName use another
of NIO.
> > > > > > > > > > >
> > > > > > > > > > > And the new problem is shall we follow
the spec[1] to
> > > support
> > > > > the
> > > > > > > two
> > > > > > > > > > > suites of charset implemetation? I
just have a look
> and
> > > find
> > > > > we
> > > > > > > does
> > > > > > > > > > > not support some Canonical Name for
java.io and
> java.langAPI
> > > > > such
> > > > > > > as
> > > > > > > > > > >
> > > > > > >
> > > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > > > > > > > > > There is such a charset name mapping in
> InputStreamReader, I
> > > > > think
> > > > > > > we
> > > > > > > > > > have no choice but to support these legacy
charset
> names,
> > > you
> > > > > may
> > > > > > > need
> > > > > > > > > > some refactory work to make these classes
use the same
> > > mapping
> > > > > data.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > >
> http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > > > > >
> > > > > > > > > > > On 10/17/06, Andrew Zhang <zhanghuangzhu@gmail.com>
> wrote:
> > > > > > > > > > >> On 10/17/06, Andrew Zhang <zhanghuangzhu@gmail.com>
> > > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > On 10/17/06, Leo Li <liyilei1979@gmail.com>
wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I think Harmony is more
reasonable.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > As spec says, if  Charset.forName("UnicodeBig")
> > > throws
> > > > > > > > > > >> > > .UnsupportedCharsetException
then no support for
> the
> > > > > named
> > > > > > > > > > >> charset is
> > > > > > > > > > >> > > available in this instance
of the Java virtual
> > > machine.
> > > > > Then
> > > > > > > how
> > > > > > > > > > >> can we
> > > > > > > > > > >> > > get
> > > > > > > > > > >> > > new String(b, "UnicodeBig")
without throwing
> > > > > > > > > > >> UnsupportedCharsetException
> > > > > > > > > > >> > > on
> > > > > > > > > > >> > > the same jvm? The spec
for String(byte[]
> bytes,String
> > > > > > > > > > >> charsetName) also
> > > > > > > > > > >> > > says
> > > > > > > > > > >> > > if the named charset
is not supported,
> > > > > > > > > UnsupportedCharsetException
> > > > > > > > > > >> > > should be
> > > > > > > > > > >> > > thrown out.
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > UNICODEBIG is a java alias
for UTF-16BE. I think
> we'd
> > > > > better
> > > > > > > > > > >> support such
> > > > > > > > > > >> > mapping in String and follow
RI.
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >> You can find the encoding set from
spec. [1]
> > > > > > > > > > >>
> > > > > > > > > > >> [1]
> > > > > > >
> http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > > > > >>
> > > > > > > > > > >>  On 10/17/06, Tony Wu <wuyuehao@gmail.com>
wrote:
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > >> > > > I found this when
I tried to debug the failure
> > > tests of
> > > > > ant
> > > > > > > on
> > > > > > > > > > >> > > > harmony. Note the
output of testcases below.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > import java.io.UnsupportedEncodingException;
> > > > > > > > > > >> > > > import java.nio.charset.Charset
;
> > > > > > > > > > >> > > > import junit.framework.TestCase;
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > public class TestCharset
extends TestCase {
> > > > > > > > > > >> > > >    public void test1()
throws
> > > > > UnsupportedEncodingException
> > > > > > > {
> > > > > > > > > > >> > > >        byte[] b
= new byte[] { 'a', 'b', 'c' };
> > > > > > > > > > >> > > >        String s
= new String(b, "UnicodeBig");
> > > > > > > > > > >> > > >        assertEquals("abc",
s);
> > > > > > > > > > >> > > >    }
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >    public void test2()
{
> > > > > > > > > > >> > > >        Charset.forName("UnicodeBig");
> > > > > > > > > > >> > > >    }
> > > > > > > > > > >> > > > }
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > RI:
> > > > > > > > > > >> > > > test1: junit.framework.ComparisonFailure:
> > > > > expected:<abc>
> > > > > > > but
> > > > > > > > > > >> was:<>
> > > > > > > > > > >> > > > test2:
> java.nio.charset.UnsupportedCharsetException
> > > :
> > > > > > > UnicodeBig
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Harmony:
> > > > > > > > > > >> > > > test1:
> java.nio.charset.UnsupportedCharsetException:
> > > > > > > UnicodeBig
> > > > > > > > > > >> > > > test2:
> > > > > > > > > > >> > > > java.nio.charset.UnsupportedCharsetException:
> The
> > > > > > > unsupported
> > > > > > > > > > >> charset
> > > > > > > > > > >> > > > name is "UnicodeBig"
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > seems RI can recognize
the *UnicodeBig* in
> > > Constructor
> > > > > of
> > > > > > > > > > >> j.l.String,
> > > > > > > > > > >> > > > whereas Harmony
does not support this alias at
> all.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Do you have any
concern about that?
> > > > > > > > > > >> > > > --
> > > > > > > > > > >> > > > Tony Wu
> > > > > > > > > > >> > > > China Software Development
Lab, IBM
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >>
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > > > >> > > > Terms of use :
> > > > > > > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > > > >> > > > To unsubscribe,
e-mail:
> > > > > > > > > > >> harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > > > >> > > > For additional commands,
e-mail:
> > > > > > > > > > >> harmony-dev-help@incubator.apache.org
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > --
> > > > > > > > > > >> > > Leo Li
> > > > > > > > > > >> > > China Software Development
Lab, IBM
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > --
> > > > > > > > > > >> > Best regards,
> > > > > > > > > > >> > Andrew Zhang
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> Best regards,
> > > > > > > > > > >> Andrew Zhang
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Paulex Yang
> > > > > > > > > > China Software Development Lab
> > > > > > > > > > IBM
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > > > Terms of use :
> > > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > > > To unsubscribe, e-mail:
> > > > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > > > For additional commands, e-mail:
> > > > > > > harmony-dev-help@incubator.apache.org
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Tony Wu
> > > > > > > > > China Software Development Lab, IBM
> > > > > > > > >
> > > > > > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > > > > > Terms of use :
> > > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > > To unsubscribe, e-mail:
> > > > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > > For additional commands, e-mail:
> > > > > harmony-dev-help@incubator.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Andrew Zhang
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Tony Wu
> > > > > > > China Software Development Lab, IBM
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > Terms of use :
> http://incubator.apache.org/harmony/mailing.html
> > > > > > > To unsubscribe, e-mail:
> > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > For additional commands, e-mail:
> > > harmony-dev-help@incubator.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrew Zhang
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Tony Wu
> > > > > China Software Development Lab, IBM
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > To unsubscribe, e-mail:
> harmony-dev-unsubscribe@incubator.apache.org
> > > > > For additional commands, e-mail:
> harmony-dev-help@incubator.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew Zhang
> > > >
> > > >
> > >
> > >
> > > --
> > > Tony Wu
> > > China Software Development Lab, IBM
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew Zhang
> >
> >
>
>
> --
> Tony Wu
> China Software Development Lab, IBM
>



-- 
Best regards,
Andrew Zhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message