cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Argyn Kuketayev <>
Subject C2.0.1 ESQL/XSP + UTF-8 encoded Japanese characters in Oracle
Date Fri, 07 Jun 2002 22:14:02 GMT
Here's my problem:

I use esql tags inside XSP to generate XML from the Oracle database with
UTF-8 encoding. English characters work fine.

Then comes the issue with Japanese characters:
1. every Chinese or Japanese character is encoded in 3 bytes in UTF-8, and
stored in the varchar2 column.
2. when I use esql logicsheets to make XML file from Oracle, the XSP is
converted into Java file. inside Java I can see that it uses getString()
method of ReslutSet object.
Unfortunately, getString() returns a String object with three characters per
every chinese character. What happens is that Oracle jdbc driver makes one
character for every byte in the database. So, the character has empty higher
byte, and lower byte is one of the bytes of UTF-8 representation of chinese
Then Cocoon gets the incorrect String, it puts &#NMUBER; for every
character, so XML has totally wrong strings.

I couldn't make Oracle to return correct String representation of data in
the database. Changing the regional settings is not the option.

Now, our Web application (without cocoon) works fine! How come? 
I'll explain:
getString() returns WRONG string with three characters per one chinese. Then
JSP page uses out.print() method. This method thinks "I'm English, I see
three characters, I have to convert them into English. I'll simply cut upper
bytes". So, print throws just three bytes, and they are RIGHT bytes! Then
browser sees that the page is UTF-8 encoded, takes three bytes, and shows
them as correct ONE chinese character.

The question is: what shall I do?

1. If I somehow make getString() to return me correct String, then seemingly
my JSPs will break - they will try to print correct character by cutting the
upper byte.

2. If I change Cocoon to use something similar to out.print() from JSP, then
it may break when somebody changes the regional settings (?).


Please check that your question has not already been answered in the
FAQ before posting. <>

To unsubscribe, e-mail: <>
For additional commands, e-mail: <>

View raw message