httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy T. Fielding" <field...@kiwi.ICS.UCI.EDU>
Subject Re: CGIWrap Problems (fwd)
Date Tue, 29 Apr 1997 10:06:48 GMT
>> This is a bug, not a feature.  The find_path_info routine does not
>> work right if the request has been made to a ScriptAlias which
>> includes its own path_info as part of the result.  The script is then
>> fed the wrong SCRIPT_NAME, PATH_INFO, and PATH_TRANSLATED because
>> find_path_info thinks that the real script name is part of the path.
>
>No, I disagree. It's a feature, not a bug. As I said in my other
>message, the current behavior is the *only* way to maintain the
>http://$SEVER_NAME:$SEVER_PORT$SCRIPT_NAME$PATH_INFO?$QUERY_STRING
>behavior that I believe is desirable. ScriptAlias was designed to map
>a URL to a directory that contained scripts. It was not intended to map
>to to CGI scripts, and was certainly not intended to map to CGI
>scripts with additional path info not present in the request. It's
>just an accident that people made use of.

That may be the reason for doing it this way, but it doesn't change
the fact that it results in the wrong information being given to
the script.  It is not a feature if it breaks the definition of SCRIPT_NAME,
which is exactly what it is doing.

>SCRIPT_NAME and PATH_INFO, as defined, come out of the *URL*. The CGI
>spec only defines PATH_TRANSLATED as being based on the filename. And
>any extra stuff tacked on with ScriptAlias is very defenitely not
>coming from any sort of URL, but from a filename. Therefore it is not
>appropriate for it to be present in PATH_INFO.

There are two issues here.  First, if the above is what is wanted, the
implementation is still wrong (see below).  Second, Apache is opening
a CGI gateway to the script cgiwrapd, and thus a ScriptAlias is performing
an internal redirection on a new URL whether or not that was intended
by the original definition of ScriptAlias [if we aren't doing that, then
we should return 404 Not Found instead].  You are assuming that the *URL*
is the Request-URI, but that is not how it was implemented in NCSA httpd
or Apache 1.1 or CERN httpd (the only real definition of CGI that matters).
In fact, this is exactly what David wrote in the CGI spec:

   A `Script URI' can be defined; this describes the resource identified
   by the environment variables. Often, this URI will be the same as the
   URI requested by the client (the `Client URI'); however, it need not
   be. Instead, it could be a URI invented by the server, and so it can
   only be used in the context of the server and its CGI interface.

   The script URI has the syntax of generic-RL as defined in section 2.1
   of RFC 1808 [7], with the exception that object parameters and
   fragment identifiers are not permitted:

      <scheme>://<host>:<port>/<path>?<query>

   The various components of the script URI are defined by some of the
   environment variables (see below);

      script-uri = protocol "://" SERVER_NAME ":" SERVER_PORT enc-script
                   enc-path-info "?" QUERY_STRING

   where `protocol' is found from SERVER_PROTOCOL, `enc-script' is a
   URL-encoded version of SCRIPT_NAME and `enc-path-info' is a
   URL-encoded version of PATH_INFO.

In other words, the definition of SCRIPT_NAME and PATH_INFO are based
on the internal URL defined by ScriptAlias and not based on the Request-URI.
The fact that the internal URL is not accessible may be less important
than the ability to pass path_info via the ScriptAlias mechanism.

Even if they were defined by the Request-URI, the implementation is still
wrong.  Look at the example given by the user:

  ScriptAlias /cgi-bin/ /magma/web/cgi-bin/cgiwrapd/userid00/

calling the Request-URI:

  /cgi-bin/test-cgi.pl/klgeddie/test

with the following record info

  r->path_info       /userid00/test-cgi.pl/klgeddie/test
  r->uri             /cgi-bin/test-cgi.pl/klgeddie/test

produces:

  SCRIPT_NAME: '/cgi-bin'
  PATH_INFO: '/test-cgi.pl/klgeddie/test'

According to the your definition of SCRIPT_NAME and PATH_INFO,
the result should be

  SCRIPT_NAME: '/cgi-bin/test-cgi.pl'
  PATH_INFO: '/klgeddie/test'

The reason it isn't is because the find_path_info routine doesn't work right.
The fact that both the incorrect result AND the correct result preserve 
http://$SERVER_NAME:$SERVER_PORT$SCRIPT_NAME$PATH_INFO?$QUERY_STRING
is not relevant.  We would need a function that correctly partitions the
$SCRIPT_NAME$PATH_INFO part.  The algorithm that could be followed is,
when ScriptAlias is present, the complete path component of the alias
(/cgi-bin) and the following component (/test-cgi.pl) are the SCRIPT_NAME,
and anything beyond that is PATH_INFO.  We'd probably need to calculate
that when the ScriptAlias is applied (in mod_alias) and store it in the
request_rec.

However, what is the point in supporting CGI if we do this differently
than all the other servers?  We cannot change the way ScriptAlias is
currently used any more than we can change existing scripts.  Therefore,
we need to reimplement it according to the old behavior (without the
core dump) and supply some other variable like REQUEST_URI for those
scripts that need both ScriptAlias path_info and the actual Request-URI.
We are not just talking about CgiWrap;  Dienst (the Technical Report
service) also depends on passing path_info within ScriptAlias.

Regardless of which way we go, it is a bug right now.

.....Roy

Mime
View raw message