cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gunnar Brand <>
Subject Double URLdecoding problem with request.getSitemapURI()
Date Tue, 04 Nov 2003 15:41:43 GMT

While creating an application that maps a complete path to a resourcereader 
(to retrieve documents from a storage transparently), I noticed a bug(?). 
Whenever the name/path of the file contained a '+' (of course properly 
encoded as %2b), it didn't find the file. The storage server echoes the 
looked up path/file and instead of the '+' there was a ' ' (+ is a 
placeholder for ' ' ).

Since the %2b does work if I get parameters directly from the request, I 
deduced that there must be some double url decoding going on. After some 
investigation it was clear that the incorrect url was fed into the reader 
and wildcard matcher so it had to happen a bit earlier already.

After a quick modification in the samples sitemap (adding ** in front of 
the match) and the RequestGenerator, I could use any path I wanted. The 
generator displayed not only the request.getRequestURI() but also the 
this.attribute(attr,"target", request.getRequestURI());
this.attribute(attr,"sitemaptarget", request.getSitemapURI());   // <-- added

With a url like
it prints (shortened a bit):

   sitemaptarget="a test dir with a plus at the end /request.html" source="">
   <h:parameter name="test">
     <h:value> +x y</h:value>

So it obviously the sitemap uri was decoded twice. The culprit seems to be 
the, so I added a small debug output (code below is from 
cvs HEAD):

   public void service(HttpServletRequest req, HttpServletResponse res)
     throws ServletException, IOException {

         // We got it... Process the request
         String uri = request.getServletPath();
     System.out.println("request.getServletPath():" + uri);  // added
         if (uri == null) {
             uri = "";
         String pathInfo = request.getPathInfo();


         Environment env;
             if (uri.charAt(0) == '/') {
                 uri = uri.substring(1);
 >>> line 1087:
             env = getEnvironment(URLDecoder.decode(uri), request, res);
         } catch (Exception e) {

The debug output is:
request.getServletPath():/samples/a test dir with a plus at the 

So the request.getServletPath() method returns a "url" that is already 
properly decoded and that is being decoded for the second time in line 
1087. This is true for both Jetty and Tomcat4.1.

Unfortunately a look into the Servlet API does not indicate if 
getServletPath is supposed to return a decoded or still URLencoded string.

public java.lang.String getServletPath()
Returns the part of this request's URL that calls the servlet. This includes
either the servlet name or a path to the servlet, but does not include any
extra path information or a query string.
Same as the value of the CGI variable SCRIPT_NAME.

Returns: a String containing the name or path of the servlet being called,
          as specified in the request URL

The big question now is, is this a bug - or are there cases where this 
method is returning encoded strings?
(For me it does look like one and I need to remove it to get my application 
working ;)


G. Brand - interface:projects GmbH
Tolkewitzer Strasse 49
D-01277 Dresden

View raw message