Return-Path: Delivered-To: apmail-cocoon-dev-archive@www.apache.org Received: (qmail 12632 invoked from network); 4 Nov 2003 15:39:51 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 4 Nov 2003 15:39:51 -0000 Received: (qmail 33132 invoked by uid 500); 4 Nov 2003 15:39:43 -0000 Delivered-To: apmail-cocoon-dev-archive@cocoon.apache.org Received: (qmail 32939 invoked by uid 500); 4 Nov 2003 15:39:42 -0000 Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: dev@cocoon.apache.org Delivered-To: mailing list dev@cocoon.apache.org Received: (qmail 32926 invoked from network); 4 Nov 2003 15:39:42 -0000 Received: from unknown (HELO innocence.interface-business.de) (193.101.57.202) by daedalus.apache.org with SMTP; 4 Nov 2003 15:39:42 -0000 Received: from rathole.interface-business.de (rathole-srv.interface-business.de [193.101.57.204]) by innocence.interface-business.de (8.12.9/8.12.9/ifb evision: 1.26 $) with SMTP id hA4FdCqB029168 for ; Tue, 4 Nov 2003 16:39:12 +0100 (CET) (envelope-from g.brand@interface-business.de) Received: from chuck.interface-business.de ([193.101.57.34]) by rathole; Tue, 04 Nov 2003 16:40:13 +0100 (MET) Received: from rei.interface-business.de (rei.interface-business.de [193.101.57.38]) by chuck.interface-business.de (8.12.9/8.12.9) with ESMTP id hA4Fd58f026913 for ; Tue, 4 Nov 2003 16:39:05 +0100 (CET) (envelope-from g.brand@interface-business.de) Message-Id: <5.0.2.1.2.20031104154314.01c7ae70@pop.interface-business.de> X-Sender: gunnar@pop.interface-business.de X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Tue, 04 Nov 2003 16:41:43 +0100 To: dev@cocoon.apache.org From: Gunnar Brand Subject: Double URLdecoding problem with request.getSitemapURI() Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Spam-Status: No, hits=-1.2 required=9.0 tests=AWL,BAYES_20 version=2.55 X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hello! While creating an application that maps a complete path to a resourcereader (to retrieve documents from a storage transparently), I noticed a bug(?). Whenever the name/path of the file contained a '+' (of course properly encoded as %2b), it didn't find the file. The storage server echoes the looked up path/file and instead of the '+' there was a ' ' (+ is a placeholder for ' ' ). Since the %2b does work if I get parameters directly from the request, I deduced that there must be some double url decoding going on. After some investigation it was clear that the incorrect url was fed into the reader and wildcard matcher so it had to happen a bit earlier already. After a quick modification in the samples sitemap (adding ** in front of the match) and the RequestGenerator, I could use any path I wanted. The generator displayed not only the request.getRequestURI() but also the request.getSitemapURI(). RequestGenerator.java: this.attribute(attr,"target", request.getRequestURI()); this.attribute(attr,"sitemaptarget", request.getSitemapURI()); // <-- added With a url like http://rei:8080/samples/a%20test%20dir%20with%20a%20plus%20at%20the%20end%2B/request.html?test=%20%2bx+y it prints (shortened a bit): +x y So it obviously the sitemap uri was decoded twice. The culprit seems to be the CocoonServlet.java, so I added a small debug output (code below is from cvs HEAD): public void service(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { // We got it... Process the request String uri = request.getServletPath(); System.out.println("request.getServletPath():" + uri); // added if (uri == null) { uri = ""; } String pathInfo = request.getPathInfo(); ..... Environment env; try{ if (uri.charAt(0) == '/') { uri = uri.substring(1); } >>> line 1087: env = getEnvironment(URLDecoder.decode(uri), request, res); } catch (Exception e) { ... The debug output is: request.getServletPath():/samples/a test dir with a plus at the end+/request.html So the request.getServletPath() method returns a "url" that is already properly decoded and that is being decoded for the second time in line 1087. This is true for both Jetty and Tomcat4.1. Unfortunately a look into the Servlet API does not indicate if getServletPath is supposed to return a decoded or still URLencoded string. getServletPath() public java.lang.String getServletPath() Returns the part of this request's URL that calls the servlet. This includes either the servlet name or a path to the servlet, but does not include any extra path information or a query string. Same as the value of the CGI variable SCRIPT_NAME. Returns: a String containing the name or path of the servlet being called, as specified in the request URL The big question now is, is this a bug - or are there cases where this method is returning encoded strings? (For me it does look like one and I need to remove it to get my application working ;) Gunnar. -- G. Brand - interface:projects GmbH Tolkewitzer Strasse 49 D-01277 Dresden