cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niclas Hedhman <>
Subject Re: Serious Encoding Problems (Umlaute)
Date Tue, 10 Jun 2003 03:11:47 GMT
On Saturday 07 June 2003 03:17 am, Alexander Schatten wrote:
> Joerg Heinicke wrote:
> > Alexander Schatten wrote:
> >> (1) UTF-8 practically only works for english texts, and does not work
> >> with ae oe ue and so on
> >
> > That's wrong. UTF-8 works for *every* character. You only must use it
> > correctly - and that's not so easy :-)
> > By default giving a browser an UTF-8 document, it will send forms
> > encoded in UTF-8 too, but Cocoon expects ISO-8859-1. You can change
> > his by setting the form encoding correctly.
> Well, I have mentioned it, I am definitily no encoding expert, but my
> practical know-how shows me with different tools(!) not only with
> cocoon, that UTF-8 does in praxis not work with, e.g., german umlauts.
> ISO-8859-1 does. Thats fact. Maybe, there are problems in
> implementations, I don't know, but this is what I experienced.

Could it possibly be your encoding understanding that is a bit flawed?

Unicode numbers are fixed in stone, and Java uses it internally for all String 
and Character. However, there is ALWAYS a Unicode to Encoding performed when 
outputting the characters to some other data medium. It is per definition 

ISO-8859-1 has a set of characters defined, and all other characters are 
encoded into "numeric text". I believe all encoding standards have a method 
of representing characters that are not part of the "encoding scope", such as 
chinese characters in ISO-8859-1.

Now, to make matters worse, at least even more confusing, is that the 
characters is eventually displayed or printed to a human, in which case there 
must be a graphical representation available for the character in question, 
also called a font.

Sad to say, today few tools and few fonts supports all character encodings.

I believe that in this jungle of confusion, you have misunderstood how to use 
character encodings. It is easy to do, done it myself many times.

For instance, MySQL requires to be setup to support UTF encodings, doesn't do 
that by default, and the JDBC driver must specify that it will use UTF in the 
connect string. You forget that, and everything seems like it doesn't work at 

I suggest that you slowly go through each part of your system, and verifies 
the use of character encoding. There should be no problem mixing them, e.g. 
having ISO-8859-1 documents which are easier to type, and serve UTF-8 to the 
web browsers.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message