tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Odd encoding of servlet parameters
Date Thu, 27 Nov 2008 12:45:45 GMT
Chris Mannion wrote:
> Hi All
> I've recently started having a problem with one of the servlets I'm
> running on a Tomcat 5.5 system.  The code of the servlet hasn't
> changed at all so I'm wondering if there are any Tomcat settings that
> could affect this kind of thing or if anyone has come across a similar
> problem before.
> The servlet in question accepts XML data that is posted to it as a URL
> parameter called 'xml'.  The code to retrieve the XML as a String
> (which is then used to build a document object) is simply -
> String xmlMessage = req.getParameter("xml");
> - where req is the HttpServletRequest object.  Until recently this has
> worked fine with the XML being received properly formatted -
> <?xml version="1.0" encoding="UTF-8"?>
>   <records>
>     <record>...
> etc.
> However, recently something has changed and the XML is now being
> retrieved from the request object with escape characters in, so the
> above has become -
> &lt;xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
>   &lt;records&gt;
>     &lt;record&gt;
> Before sending the XML is encoded using the object
> and the UTF-8 character set, but using a on
> receiving it does not get rid of the encoded characters.  I did some
> reading about a possible Tomcat 6.0 bug and so tried explicitly
> setting the character encoding (req.setCharacterEncoder("UTF-8"))
> before retrieving the parameter but that had no effect either and even
> if there's something that could explicitly decode the &lt; &gt; etc. I
> couldn't use it as the XML data often contains characters like &amp;
> which have to remain encoded to keep the XML valid.
> As I said, this problem started without the servlet code having
> changed at all so is there any Tomcat setting that could be
> responsible for this?
Just a couple of indirect comments on the above.

In your post, you seem to indicate that you also control the client 
which sends the request to Tomcat.
If so, and for that kind of data, might it not be better to send the 
data in the body of a request, instead of in the URL ?
That is probably not the bottom reason of the issue you describe above, 
but it may avoid similar questions of encoding in the future.
(check the HTTP POST method, and enctype=multipart/form-data)
It will also avoid the case where your data gets so long that the 
request URLs (and thus your data) get cut off at a certain length.

Next, the way you indicate that the data is now received, shows an "html 
  style" encoding, rather than a "URL style" encoding.
If the data was now URL-encoded, it would not have (for example) 
"&quot;" replacing a quotation mark, but it would have some %xy sequence 
instead (where xy is the iso-8859-1 codepoint of the character, 
expressed in hexdecimal digits).
What I mean is that it is very unlikely that this encoding just happens 
"automatically" due to some protocol layer at the browser or HTTP server 
level.  There must be something that explicitly encodes your original 
request data in this way, before it even gets put in a URL.

I guess what I am trying to say, is that maybe you are looking in the 
wrong place for your problem, by focusing on the receiving Tomcat side 
first. I believe you should first have a good look at the sending side.

To start a new topic, e-mail:
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message