Return-Path: Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 58160 invoked by uid 500); 8 Jul 2002 16:04:49 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 58146 invoked from network); 8 Jul 2002 16:04:49 -0000 Errors-To: Message-Id: <5.1.0.14.2.20020708104353.02f23718@pop3.rowe-clan.net> X-Sender: wrowe%rowe-clan.net@pop3.rowe-clan.net X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Mon, 08 Jul 2002 11:03:48 -0500 To: dev@httpd.apache.org From: "William A. Rowe, Jr." Subject: RE: PATH_INFO in A2? Cc: Ryan Bloom , dev@httpd.apache.org, "'Dale Ghent'" In-Reply-To: References: <028a01c22504$b48ef5c0$0a01230a@KOJ> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=====================_486340484==_.ALT" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --=====================_486340484==_.ALT Content-Type: text/plain; charset="us-ascii"; format=flowed At 12:02 PM 7/6/2002, Rasmus Lerdorf wrote: > > > What is a dynamic page if not a PHP page? > > > > Like I said, Apache doesn't know if a file on disk is meant for PHP or > > not. The best way to fix this would be for mod_php to set the value if > > the filter is added for the request. > > > > I agree, it would be cool if Apache could set this correctly based on > > the filters that have been added for the request. > >Seems like there should be an API call where a filter can specify whether >it is a dynamic one or not as opposed to specifically flipping the >acceptpathinfo bit. There is, and they shouldn't. See core.c:3145 /* Deal with the poor soul who is trying to force path_info to be * accepted within the core_handler, where they will let the subreq * address its contents. This is toggled by the user in the very * beginning of the fixup phase, so modules should override the user's * discretion in their own module fixup phase. It is tristate, if * the user doesn't specify, the result is 2 (which the module may * interpret to its own customary behavior.) It won't be touched * if the value is no longer undefined (2), so any module changing * the value prior to the fixup phase OVERRIDES the user's choice. */ if ((r->used_path_info == AP_REQ_DEFAULT_PATH_INFO) && (conf->accept_path_info != 3)) { r->used_path_info = conf->accept_path_info; } Effectively, we allow any module to set r->used_path_info to flag that the request is valid with path_info. The reason to your question in another post, 'Why not always default to on???' was pretty simple. Most filters don't process path info. php, perl and includes sometimes use the path info, sometimes won't. That's why it can be turned off anywhere, even if your preference is to allow it. Modules can override the -default- preference, but never an explicit preference in the config. So why not leave it on everywhere? That's been discussed many times, the IBM WebSphere product did so by default (even static content.) The problem is recursion, you have no way of telling a robot or other scanner that there are no more -new- pages here to read, so they can just pound the heck out of duplicated URI space. We used to have this on www.apache.org/httpd.html. Folks started using www.apache.org/httpd [multiviewed]. Then folks started referring to www.apache.org/httpd/index.html, as if this were a directory. It never was. It's only productive to allow PATH_INFO if the application (PHP, Perl, etc) can provide a 404 itself if the PATH_INFO isn't interesting. Includes can't do that, which is why I provided the r->used_path_info flag to advise that some filter might be consuming that bit. Bill --=====================_486340484==_.ALT Content-Type: text/html; charset="us-ascii" At 12:02 PM 7/6/2002, Rasmus Lerdorf wrote:
> > What is a dynamic page if not a PHP page?
>
> Like I said, Apache doesn't know if a file on disk is meant for PHP or
> not.  The best way to fix this would be for mod_php to set the value if
> the filter is added for the request.
>
> I agree, it would be cool if Apache could set this correctly based on
> the filters that have been added for the request.

Seems like there should be an API call where a filter can specify whether
it is a dynamic one or not as opposed to specifically flipping the
acceptpathinfo bit.

There is, and they shouldn't.  See core.c:3145

    /* Deal with the poor soul who is trying to force path_info to be
     * accepted within the core_handler, where they will let the subreq
     * address its contents.  This is toggled by the user in the very
     * beginning of the fixup phase, so modules should override the user's
     * discretion in their own module fixup phase.  It is tristate, if
     * the user doesn't specify, the result is 2 (which the module may
     * interpret to its own customary behavior.)  It won't be touched
     * if the value is no longer undefined (2), so any module changing
     * the value prior to the fixup phase OVERRIDES the user's choice.
     */
    if ((r->used_path_info == AP_REQ_DEFAULT_PATH_INFO)
        && (conf->accept_path_info != 3)) {
        r->used_path_info = conf->accept_path_info;
    }

Effectively, we allow any module to set r->used_path_info to flag that
the request is valid with path_info.

The reason to your question in another post, 'Why not always default
to on???' was pretty simple.  Most filters don't process path info.
php, perl and includes sometimes use the path info, sometimes won't.
That's why it can be turned off anywhere, even if your preference is to
allow it.  Modules can override the -default- preference, but never an
explicit preference in the config.

So why not leave it on everywhere?  That's been discussed many times,
the IBM WebSphere product did so by default (even static content.)  The
problem is recursion, you have no way of telling a robot or other scanner
that there are no more -new- pages here to read, so they can just pound
the heck out of duplicated URI space.

We used to have this on www.apache.org/httpd.html. Folks started using
www.apache.org/httpd [multiviewed].  Then folks started referring to
www.apache.org/httpd/index.html, as if this were a directory.  It never was.

It's only productive to allow PATH_INFO if the application (PHP, Perl, etc)
can provide a 404 itself if the PATH_INFO isn't interesting.  Includes can't
do that, which is why I provided the r->used_path_info flag to advise that
some filter might be consuming that bit.

Bill







--=====================_486340484==_.ALT--