tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <>
Subject Re: svn commit: r423967 - /tomcat/tc6.0.x/trunk/java/org/apache/catalina/connector/
Date Fri, 21 Jul 2006 00:06:42 GMT
Jean-frederic Clere wrote:
> William A. Rowe, Jr. wrote:
>> Guys, let me clarify, you are only paying attention to ';' following the
>> QUERY_STRING delimiter '?', correct?
>> ';' means nothing special before the '?', double check your 
>> interpretation
>> of RFC 2616.  I can have /;bash?v1=a;v2=b (or ...?v1=a&v2=b) 
>> and that
>> semi is part of the;bash filename.  Right?
> Then what I have just commited is not right...
> But in mod_jk the behaviour without the patch is weird.
> Try:
> JkMount /*.jsp worker1
> And url like http://localhost/;jsp-examples/jsp2/;simpletag/;hello.jsp
> without the patches.

That may mean the core tomcat parser doesn't parse according to rfc 2616...
or it's simply an issue that ; should be escaped.  See 3.2.3

    Characters other than those in the "reserved" and "unsafe" sets (see
    RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

which says

2.2. Reserved Characters

    Many URI include components consisting of or delimited by, certain
    special characters.  These characters are called "reserved", since
    their usage within the URI component is limited to their reserved
    purpose.  If the data for a URI component would conflict with the
    reserved purpose, then the conflicting data must be escaped before
    forming the URI.

       reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                     "$" | ","

    The "reserved" syntax class above refers to those characters that are
    allowed within a URI, but which may not be allowed within a
    particular component of the generic URI syntax; they are used as
    delimiters of the components described in Section 3.

Now I realize that tomcat gets it's clue on ";" from the same RFC 2396

3.3. Path Component

    The path component contains data, specific to the authority (or the
    scheme if there is no authority component), identifying the resource
    within the scope of that scheme and authority.

       path          = [ abs_path | opaque_part ]

       path_segments = segment *( "/" segment )
       segment       = *pchar *( ";" param )
       param         = *pchar

       pchar         = unreserved | escaped |
                       ":" | "@" | "&" | "=" | "+" | "$" | ","

    The path may consist of a sequence of path segments separated by a
    single slash "/" character.  Within a path segment, the characters
    "/", ";", "=", and "?" are reserved.  Each path segment may include a
    sequence of parameters, indicated by the semicolon ";" character.
    The parameters are not significant to the parsing of relative

But I was under the belief that RFC 2616 did NOT adopt this structure
for-per-path segment param values.  What we are discussing doesn't inform
tomcat what to do with other abs_path values from other protocols,
only from HTTP.

Now that I reread 2616;

3.2.1 General Syntax

    URIs in HTTP can be represented in absolute form or relative to some
    known base URI [11], depending upon the context of their use. The two
    forms are differentiated by the fact that absolute URIs always begin
    with a scheme name followed by a colon. For definitive information on
    URL syntax and semantics, see "Uniform Resource Identifiers (URI):
    Generic Syntax and Semantics," RFC 2396 [42] (which replaces RFCs
    1738 [4] and RFC 1808 [11]). This specification adopts the
    definitions of "URI-reference", "absoluteURI", "relativeURI", "port",
    "host","abs_path", "rel_path", and "authority" from that

I see it ***does*** adopt abs_path, and that includes the definition

       segment       = *pchar *( ";" param )

which means, in short, I believe the scheme parser of httpd is at least
partly flawed :)

Note that the definition of a URI abs_path param informs the resource on
a segment-by-segment basis.  This is quite different than the definition
of an http "query" part (not mentioned in 3.2.1 above)

   http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

Note especially RFC 2616's section 13.9...

    Unless the origin server explicitly prohibits the caching of their
    responses, the application of GET and HEAD methods to any resources
    SHOULD NOT have side effects that would lead to erroneous behavior if
    these responses are taken from a cache. They MAY still have side
    effects, but a cache is not required to consider such side effects in
    its caching decisions. Caches are always expected to observe an
    origin server's explicit restrictions on caching.

    We note one exception to this rule: since some applications have
    traditionally used GETs and HEADs with query URLs (those containing a
    "?" in the rel_path part) to perform operations with significant side
    effects, caches MUST NOT treat responses to such URIs as fresh unless
    the server provides an explicit expiration time.

If you use segment of *( ";" param ) in your path, ponder a moment; those
parameters to a GET or HEAD requests will be ignored by the proxy in it's
determination of whether to invalidate a stale cache entry.  They *are*
treated as unique, but a subsequent call to /deleteme;user=wrowe will *not*
cause the proxy to refetch the action from the origin server.  A subsequent
first request to GET /deleteme;user=jean-frederic would, of course, be passed
to the origin server, as that path is different from /deleteme;user=wrowe and
is not in the cache.

I'm suspecting alot of GET/HEAD requests from this parameter model are not
observing RFC2616 and it's cache control logic, unless they are explicitly
responding that the 'action' is not cacheable in the response headers:)

So please make sure you've thought this through and that tomcat is doing
precisely as RFC2616 declared, and take note that my original objection does
not precisely play out the way I stated it.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message