Return-Path: Mailing-List: contact cocoon-dev-help@xml.apache.org; run by ezmlm Delivered-To: mailing list cocoon-dev@xml.apache.org Received: (qmail 48422 invoked from network); 31 Jan 2000 15:30:31 -0000 Received: from mtx.mplik.ru (195.58.0.133) by 63.211.145.10 with SMTP; 31 Jan 2000 15:30:31 -0000 Received: (qmail 7405 invoked from network); 31 Jan 2000 15:07:28 -0000 Received: from ccc-mtxe.private.uwc.mplik.ru (HELO ccc.private.uwc.mplik.ru) (root@192.168.210.30) by mtx-eth0.private.uwc.mplik.ru with SMTP; 31 Jan 2000 15:07:28 -0000 Received: from venus ([192.168.30.23]) by ccc.private.uwc.mplik.ru with smtp id m12FIBc-00006OC (Debian Smail-3.2 1996-Jul-4 #2); Mon, 31 Jan 2000 19:52:16 +0500 (YEKT) Message-ID: <000b01bf6bfd$11250540$171ea8c0@venus.private.uwc.mplik.ru> From: "Victor Smirnov" To: "Cocoon-dev" Subject: Remarks on i18n Date: Mon, 31 Jan 2000 20:08:44 +0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Hello! The problem, IMO, is rather funny. Let's take /samples/hello/hello-page.xml and change the greating to some what russian. We get something like '???? ??????' in the browser. That's not what is desired. I've started experimenting around this. I tried diff -r1.10 Engine.java 316c316 < out.println(page.getContent()); --- > out.println(new String(page.getContent().getBytes(), 0)); This is the place, where results are sent back to client. After applying this change, I've got correct russian text. So, I can suppose, that servlet engine expects string in "plain-ascii" and fails to convert anything else. I looked at the servlet API (SDK 2.0) and found out that there is no way to output byte array (or byte stream) to user - only string, and there is no way to set the encoding of the resulting byte array. (Correct me, if I'm wrong) To overcome this we can have property which sets the result encoding and enstead of page.getContent().getBytes() convert the result string to byte array in the proper encoding. Propably this is the case with the current version of the Servlet API and will be somehow fixed in future. May be someone can tell this? IMO this is dirty hack, but do you have better ideas? - Victor