tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefanos Karasavvidis <>
Subject Fwd: JSR 154 Comments - setCharacterEncoding
Date Tue, 13 Jan 2004 12:45:30 GMT
As I said in a previous posting, I had sent an email to JSR154 about the 
setCharacterEncoding issue. I got the following answer from Yutaka 
Yoshida who expresses his personal opinion


I share the same thought. I agree we need a facility to handle
this case as many people pass non 8859-1 chars on GET params.
Also, no, I don't mind your forwarding my message, but please
make sure that this was my personal opinion.

thank you,
Yutaka Yoshida
Sun Microsystems, Inc.

Stefanos Karasavvidis wrote:

 > Hi Yuta,
 > thank you for the reply.
 > your point of view matches the one of the tomcat developers, and I 
don't really have anything to object as far as standard conformance is 
 > BUT:
 > first of all I assume that you agree that passing parameters with the 
GET method is at least useful (IMO it is a necessity). Consider how you 
would pass to someone a URL to a dynamically generated page, based on 
GET parameters.
 > So there has to be a way to correctly encode text values within the 
URI.  Prior to Servlet/JSP 2.3/1.2 every vendor had it's own way to 
bypass this issue (if there was any) and every developer had to 
"manually" decode the values and hope that the servlet container does 
not have it's own way of doing the decoding (e.g. Sun Web Server).
 > We "non latin" developers could not use automatic form handling 
applications (they just called getParameter), and EVERY new web 
application introduced a new way of handling these issues.
 > The introduction of the setCharacterEncoding method was such a BIG 
relieve for us, and now we have again to return to old style coding 
methods (I personally use servlets since 1998).
 > I also used to teach an undergraduate "web applications" laboratory 
at the Technical University of Crete (trying to use Java), and it was 
always difficult enough to explain to the students that "1 byte is NOT 1 
character" when they tried to pass Greek text values. Now we have to 
explain to them that the way they learnt to deal with encodings within 
the servlet spec has to change again. From a teaching point of view this 
is not a problem (it is actually legitimate). But it is not legitimate 
if the students have to provide so much effort to just get some 
parameter... these are issues that should have been easy to handle a 
long time ago (and they were just until now).
 > Anyway...
 > introducing a new method is IMO the right way to go. And this should 
be done fast because there has been a lot of confusion and even damage. 
Leaving this unaddressed and hoping that every developer will become an 
expert in character encoding issues is IMHO not acceptable.
 > It is not enough to state that the Java as a language and as a web 
application development framework can handle internationally addressed 
applications. These simple everyday problems should have a consistent 
and unchallengeable way of handling.
 > Thank you for your time
 > Do you mind if I forward this message to the tomcat-dev list?
 > Regards
 > Stefanos Karasavvidis
 >  > how much it's useful
 >  > since what we have to do is just re-creating a String from getBytes
 > Yuta Yoshida wrote:
 >> Hi Stefanos,
 >> I personally believe setCharacterEncoding() should only affect
 >> the body as stated in javadoc, in other words, POST. Because:
 >>   o your second paragraph below
 >>   o if this method affected the URI too, it introduces another
 >>     meaning. As you know there're two mappings in URI. One is
 >>     from characters in URI to the octet and the other is from
 >>     the octet to the original character. Set[ting]CharacterEncoding
 >>     of the POST body is direct - the body is actually encoded in
 >>     the encoding scheme specified by the method, however, doing so
 >>     of the GET query param is not direct - it is encoded in ascii
 >>     but the method is specifying the encoding of the original
 >>     characters. That's confusing.
 >> Considering that the original encoding of GET URI doesn't have to
 >> be the same as the one of the POST body, we might need a new method
 >> to specify the GET encoding. But I'm not sure how much it's useful
 >> since what we have to do is just re-creating a String from getBytes.
 >> Anyway, I'll put this into the list we need to address in the next
 >> version of the specification. I understand most containers currently
 >> implement this method for both POST and GET and we need to take that
 >> fact into consideration.
 >> Thank you for the comment,
 >> Yutaka Yoshida
 >> Sun Microsystems, Inc.
 >>> There has been a dispute lately in the tomcat-development list 
about whether the
 >>> Request.setCharacterEncoding(String encoding)
 >>> method sets the encoding for both HTTP GET and POST parameters, or 
only for HTTP POST parameters.
 >>> The developers argued that as there is no standard way for encoding 
characters in the URI, there is no possibility to encode the query 
string of the URI (the GET parameters) differently than the first part 
of the URI. Thus the setCharacterEncoding method's encoding is applied 
ONLY to POST parameters.
 >>> This change of behaviour has been applied to tomcat version 4.1.29 
and 5.0.16 although there has been added a special tomcat configuration 
parameter (not available until the next versions will be released) which 
puts back the old behaviour, but the default will remain to be to not 
encode GET parameters according to the method.
 >>> A list of bugs filed on this issue is available in the folowing posting
 >>> and many related messages exist within the developer list (search 
for "setCharacterEncoding")
 >>> As this change in the reference implementation breaks the common 
behaviour of other servlet engines (as well as tomcats previous to the 
latest releases behaviour), I ask you to clarify this issue.
 >>> Regards
 >>> Stefanos Karasavvidis

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message