Return-Path: Mailing-List: contact tomcat-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list tomcat-dev@jakarta.apache.org Received: (qmail 10652 invoked from network); 6 Feb 2001 06:04:20 -0000 Received: from smtp.ticnet.com (206.67.78.26) by h31.sny.collab.net with SMTP; 6 Feb 2001 06:04:20 -0000 Received: (qmail 24953 invoked from network); 6 Feb 2001 06:04:26 -0000 Received: from unknown (HELO dogbert) (63.73.246.34) by smtp.ticnet.com with SMTP; 6 Feb 2001 06:04:26 -0000 Message-ID: <005901c09002$a963d3c0$22f6493f@home> From: "Tim Tye" To: References: <001601c08f5d$e264ef10$6f01a8c0@bequbed.com> Subject: Re: serializing XML to a ServletOutputStream fails Date: Tue, 6 Feb 2001 00:04:26 -0600 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-Mimeole: Produced By Microsoft MimeOLE V5.00.2615.200 X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N UTF-16 is not an acceptable encoding for XML as it takes two bytes per character, is byte order sensitive, and the XML tags would not be recognized... UTF-8 is the correct encoding! Any 31 bit character in the ISO10646 specification can be correctly represented in UTF-8. UNICODE is the first 65768 characters of ISO10646. A CKJ character code point value of 0x6123 is represented in UTF-8 as three bytes E6 84 A3. What byte values are you seeing for the encoding of a given Chinese code point? ----- Original Message ----- From: Zhu Ming To: ; Sent: Monday, February 05, 2001 4:24 AM Subject: RE: serializing XML to a ServletOutputStream fails > Hi, > > Maybe you should not use character set "UTF-8". I remember > that it's 8-bit Unicode. As I know, Chinese and Korean has > 16-bit code. So at least, you should try 16-bit Unicode. > I forgot the name, maybe it's "UTF-16". But I'm not sure if > JDK have fully support to "UTF-16". > > I'm not an Unicode expert. I'll be happy if what I say can > be a hint to solve this problem. > > Ming > > > -----Original Message----- > From: Michael Mealling [mailto:michael@bailey.dscga.com] > Sent: Monday, February 05, 2001 03:04 > To: tomcat-dev@jakarta.apache.org > Subject: serializing XML to a ServletOutputStream fails > > > (This might be a bug so I'm cc-ing to tomcat-dev) > Hi, > I'm trying to serialize some XML out to a ServletOutputStream but > the resulting XML on the client side contains corrupted Unicode > characters (the DOM I'm serializing out contains Chinese, Korean, > English, etc). Here's the code in question: > > response.setContentType("text/xml; charset=UTF-8"); > ServletOutputStream out = response.getOutputStream(); > > out.print("\n" + > " " \"http://www.ietf.org/cnrp.dtd\">\n"); > out.flush(); > OutputFormat format = new OutputFormat(document); > format.setOmitXMLDeclaration(true); > format.setIndenting(true); // it makes debuggin easier > format.setEncoding("UTF-8"); // this is the default anyway > XMLSerializer serializer = new XMLSerializer(out, format); > serializer.serialize(document.getDocumentElement()); > > The XML that the client gets is fine except that the non-ASCII subset > of the UTF-8 encoded Unicode characters are garbled. I can serialize > the XML out to a FileOutputStream and it works just fine. > > I'm running Tomcat 3.2.1 that's the backend for a remote > Apache 1.3.17 server using ajp13 (and thus mod_jk). > > This code looks like its the right way to do this but either > I've hit a bug or else I'm missing something (an encoding somewhere > between a Stream and a Writer?) > > -MM > > -- > -------------------------------------------------------------------------- -- > ---- > Michael Mealling | Vote Libertarian! | www.rwhois.net/michael > Sr. Research Engineer | www.ga.lp.org/gwinnett | ICQ#: > 14198821 > Network Solutions | www.lp.org | michaelm@netsol.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org > For additional commands, email: tomcat-dev-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org > For additional commands, email: tomcat-dev-help@jakarta.apache.org > >