cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Argyn Kuketayev <Argyn.Kuketa...@plateau.com>
Subject RE: C2.0.1 ESQL/XSP + UTF-8 encoded Japanese characters in Oracle
Date Fri, 07 Jun 2002 22:24:08 GMT
as a workaround, I wrote a logicsheet to convert wrong Strings into right
one. So, I call it everytime when access the String columns. Apparently, it
slows down the processing, and I have large amounts of data for reporting.
Megabytes, actually.

> -----Original Message-----
> From: Argyn Kuketayev [mailto:Argyn.Kuketayev@plateau.com]
> Sent: Friday, June 07, 2002 6:14 PM
> To: Cocoon-Users (E-mail)
> Subject: C2.0.1 ESQL/XSP + UTF-8 encoded Japanese characters in Oracle
> 
> 
> Here's my problem:
> 
> I use esql tags inside XSP to generate XML from the Oracle 
> database with
> UTF-8 encoding. English characters work fine.
> 
> Then comes the issue with Japanese characters:
> 1. every Chinese or Japanese character is encoded in 3 bytes 
> in UTF-8, and
> stored in the varchar2 column.
> 2. when I use esql logicsheets to make XML file from Oracle, 
> the XSP is
> converted into Java file. inside Java I can see that it uses 
> getString()
> method of ReslutSet object.
> Unfortunately, getString() returns a String object with three 
> characters per
> every chinese character. What happens is that Oracle jdbc 
> driver makes one
> character for every byte in the database. So, the character 
> has empty higher
> byte, and lower byte is one of the bytes of UTF-8 
> representation of chinese
> character.
> Then Cocoon gets the incorrect String, it puts &#NMUBER; for every
> character, so XML has totally wrong strings.
> 
> I couldn't make Oracle to return correct String 
> representation of data in
> the database. Changing the regional settings is not the option.
> 
> Now, our Web application (without cocoon) works fine! How come? 
> I'll explain:
> getString() returns WRONG string with three characters per 
> one chinese. Then
> JSP page uses out.print() method. This method thinks "I'm 
> English, I see
> three characters, I have to convert them into English. I'll 
> simply cut upper
> bytes". So, print throws just three bytes, and they are RIGHT 
> bytes! Then
> browser sees that the page is UTF-8 encoded, takes three 
> bytes, and shows
> them as correct ONE chinese character.
> 
> The question is: what shall I do?
> 
> 1. If I somehow make getString() to return me correct String, 
> then seemingly
> my JSPs will break - they will try to print correct character 
> by cutting the
> upper byte.
> 
> 2. If I change Cocoon to use something similar to out.print() 
> from JSP, then
> it may break when somebody changes the regional settings (?).
> 
> Argyn
> 
> 
> ---------------------------------------------------------------------
> Please check that your question has not already been answered in the
> FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>
> 
> To unsubscribe, e-mail: <cocoon-users-unsubscribe@xml.apache.org>
> For additional commands, e-mail: <cocoon-users-help@xml.apache.org>
> 

---------------------------------------------------------------------
Please check that your question has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>

To unsubscribe, e-mail: <cocoon-users-unsubscribe@xml.apache.org>
For additional commands, e-mail: <cocoon-users-help@xml.apache.org>


Mime
View raw message