Return-Path: Delivered-To: apmail-ws-axis-dev-archive@www.apache.org Received: (qmail 11069 invoked from network); 27 Nov 2003 14:01:03 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 27 Nov 2003 14:01:03 -0000 Received: (qmail 9951 invoked by uid 500); 27 Nov 2003 14:00:56 -0000 Delivered-To: apmail-ws-axis-dev-archive@ws.apache.org Received: (qmail 9849 invoked by uid 500); 27 Nov 2003 14:00:56 -0000 Mailing-List: contact axis-dev-help@ws.apache.org; run by ezmlm Precedence: bulk Reply-To: axis-dev@ws.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list axis-dev@ws.apache.org Received: (qmail 9837 invoked from network); 27 Nov 2003 14:00:55 -0000 Message-ID: <64510FFDEBCAD511B8CB00065B055DE3093B34@galaxy.natsys.fr> From: =?iso-8859-1?Q?C=E9dric_Chabanois?= To: "'axis-dev@ws.apache.org'" Subject: RE: bug #24896 : I don't understand what we are doing in Abstract XMLE ncoder Date: Thu, 27 Nov 2003 14:50:39 +0100 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I still don't understand why we use UTF-8 or UTF-16 there ... Concerning what we need to escape, this is described at=20 http://www.w3.org/TR/REC-xml#syntax Valid characters are at http://www.w3.org/TR/REC-xml#charsets UTF-8 aside, I think we did the right thing. However I think that=20 "private static final byte[] AMP =3D "&".getBytes();" is not valid. It should probably be "AMP =3D "&".getBytes("UTF-8");" for UTF8 and "AMP =3D "&".getBytes("UTF-16");" for UTF-16 Concerning the tests that failed after my patch, I understand why they failed. In EncodingTest.testUTF8 assertEquals(GERMAN_UMLAUTS, new String(encodedUmlauts.getBytes(), XMLEncoderFactory.ENCODING_UTF_8)); should be assertEquals(GERMAN_UMLAUTS, new String(encodedUmlauts.getBytes(XMLEncoderFactory.ENCODING_UTF_8), XMLEncoderFactory.ENCODING_UTF_8)); or (simpler) assertEquals(GERMAN_UMLAUTS, encodedUmlauts); However it does not test much ...=20 It just test that the string given (which does not need to be escaped) = to encoder.encode has not been modified by it. C=E9dric > -----Message d'origine----- > De : Davanum Srinivas [mailto:dims@yahoo.com] > Envoy=E9 : mercredi 26 novembre 2003 17:01 > =C0 : axis-dev@ws.apache.org > Objet : Re: bug #24896 : I don't understand what we are doing in > AbstractXMLE ncoder >=20 >=20 > See http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3D19327=20 > for more info. >=20 > --- C=E9dric_Chabanois wrote: > > Hi all, > >=20 > > My correction for bug #24896 worked ie xml sent is in UTF-8=20 > format (before > > french accents, chinese characters ... were not transmitted=20 > correctly) but I > > don't really understand what we are doing In AbstractXMLEncoder and > > UTF8Encoder : > > encode method takes a java String. > > This string is converted to a byte array in UTF-8 (using > > String.getBytes("UTF-8")) and > > & becomes "&" > > " becomes """ > > < becomes "<" > > > becomes ">" > > all other characters are encoded using UTF-8 (appendEncoded=20 > method in > > UTF8Encoder). > >=20 > > Then the characters are converted back to a string (using=20 > UTF-8 charset > > since my patch and using platform's default charset before=20 > my patch : the > > bytes were not valid for the default charset) > >=20 > > I wonder why we use an UTF-8 byte array there just to=20 > reconvert it to a > > string after since all we do is to convert some characters=20 > (& -> & ...). > >=20 > > There is probably something I missed somewhere ... > >=20 > > C=E9dric >=20 >=20 > =3D=3D=3D=3D=3D > Davanum Srinivas - http://webservices.apache.org/~dims/ >=20