Return-Path: Delivered-To: apmail-new-httpd-archive@apache.org Received: (qmail 64586 invoked by uid 500); 5 Apr 2001 20:21:44 -0000 Mailing-List: contact new-httpd-help@apache.org; run by ezmlm Precedence: bulk Reply-To: new-httpd@apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list new-httpd@apache.org Received: (qmail 64575 invoked from network); 5 Apr 2001 20:21:43 -0000 Date: Thu, 5 Apr 2001 21:29:27 +0100 From: Francis Daly To: new-httpd@apache.org Subject: [PATCH] mod_negotiation and order of suffixes Message-ID: <20010405212927.A9306@kerna.ie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N Hi there, for your consideration, appended to this mail is a patch to remove the requirements on the order of suffixes when using MultiViews / mod_negotiation. This corresponds to (part of) PR3430. The patch is relative to the version of mod_negotiation.c distributed with apache-2.0.15. I believe that's identical to the one from apache-2.0.16. But first, some notes: The current method takes the "file" part of r->filename (either the bit after the final / in the URL, or the value of DirectoryIndex). First, if the exact filename matches, mod_negotiation declines to handle it. Second, for each file in the directory, it checks for (regex syntax) /^file\./, and only considers ones that match. This patched method, for the requested file "file" does the same thing after a single extra if(strchr()). However, if the r->filename is actually "file.s1.s2.sZ" (with dots), the current way looks for /^file\.s1\.s2\.sZ\./; the patched way looks for each of /^file\./, /\.s1/, /\.s2/, /\.sZ/. It bails out at the first failure. In this case, the patched code does an extra strchr, strlen, strstr, some pointer arithmetic and pointer movement, changes a character, and changes it back. Per dot in r->filename, per file in the directory. I don't have numbers for how expensive that extra string manipulation is. Some consequences of this implementation are: Current method: file "name.en.html" is only accessible through (partial) URIs "name", "name.en", or "name.en.html" Patched method: The same three work, as do "name.html" and "name.html.en". That's good. However: so do "name.htm", "name.htm.en", and "name.en.htm". That may or may not be considered good. More however: so do "name.h", "name.h.h", "name...h.e.e..e.h.h.", and an infinite number of similar variations. That may not be considered good. [ side note -- most of that infinitude could be eliminated, if desired, by (for example) checking that the length of r->filename (prefix_len, in the code) is not more than the length of dirent.name, immediately before the while loop which looks for dots in filp. I would consider that an enhancement to, rather than an integral part of, the patch, so didn't include it here. Opinions may differ ] In each case, the content is returned with a Content-Location: header indicating the canonical filename. The requirements are (1)r->filename up to the first dot must match the real filename up to the first dot; (2)each .suffix in r->filename must exist (string match) in the real filename; (3)the real filename must correspond to a known mime-type, encoding, etc -- which I think means that the final suffix must be known, and only suffixes followed by known suffixes are considered. As a real example, testing with the apache "It worked!" page (named index.html.LANG), if I request index.html.fr, I get the page back. If I request index.fr.html, or just index.fr, I get back the 406 Not Acceptable page, with a link to index.html.fr, _unless_ I include fr as an acceptable language. That's PR6282, which is mentioned but not addressed in this patch. If I include fr as a language, I can request /index.fr, /index.fr.html, or /index.html.fr successfully. If I include fr as my preferred language, I can additionally request /, /index, and /index.html. (As well as the .h, .ht, .htm, .f variants referred to earlier). If I request /index.d, I get a 406 with links to index.html.de and index.html.dk As a faked example, consider five files in the DocumentRoot, with no special customisations to the (MIME) configuration: files a.b.c, d.e.html, g.h.i.j.k.en, m.n.o.p.q.html, s.t.html.u.v The following requests have the indicated results: GET /a -> not found GET /a.b -> not found GET /a.c -> not found GET /a.b.c -> success GET /d -> success GET /d.e -> success GET /d.h -> success GET /d.html -> success GET /d....html -> success GET /g -> not found GET /g.h -> not found GET /g.h.i.j.k -> not found GET /g.h.i.j.k.en -> success GET /g.h.i.k.j.en -> not found GET /m -> success GET /m.html -> success GET /m.o.q.p.n -> success GET /m.o.r.p.n -> not found GET /s.t.html.u.v -> success GET /s -> not found GET /s.t.html.u -> not found note that in the "not found" cases there (except for /m.o.r.p.n), the patched code does pass the file down as being potentially valid -- it's later code which decides that it doesn't know how to treat the final suffix, and fails it. As another faked example, with files ..d.f.html and .e.txt, I can successfully issue GETs for /.d, /.f, /.h, /.e and /.t, as well as things like /....t. (whether or not the final . there is punctuation). So that's it. Any comments, criticism, or ridicule related to the patch, please send my way, or to the list. All the best, f -- Francis Daly deva@daoine.org --- modules/mappers/mod_negotiation.c.orig Wed Apr 4 18:59:20 2001 +++ modules/mappers/mod_negotiation.c Thu Apr 5 20:51:13 2001 @@ -911,6 +911,9 @@ struct var_rec mime_info; struct accept_rec accept_info; void *new_var; + char *pos; + int pos_len; + int not_this_dirent; clean_var_rec(&mime_info); @@ -935,12 +938,76 @@ request_rec *sub_req; /* Do we have a match? */ - if (strncmp(dirent.name, filp, prefix_len)) { - continue; - } - if (dirent.name[prefix_len] != '.') { - continue; + + if ((pos = strchr(filp, '.'))) { + + /* Given "name.suf1.suf2.suffix", check for "name." */ + + pos_len = pos - filp + 1; + if (strncmp(dirent.name, filp, pos_len)) { + continue; + } + + not_this_dirent = 0; + filp = ++pos; + + /* Check for each internal ".sufN" from r->filename */ + while ((pos = strchr(filp, '.'))) { + --filp; + pos_len = pos - filp ; + filp[pos_len]='\0'; + if (!strstr(dirent.name, filp)) { + not_this_dirent=1; + } + + /* XXX: Right now, filp points to a suffix (encoding indicator, + * handler indicator, mime-type indicator, whatever), starting + * with a ".". If we want to do stuff, like consider that to be + * an implicit additional Accept: header, here would be a good + * place to do it. See PR6282 for an example of what I mean. + * Note, this would have to be repeated once more, just after the + * check for the final ".suffix" and before filp gets moved back + * again. + */ + + filp[pos_len] = '.'; + filp += pos_len + 1; + + if (not_this_dirent) { + /* get to next dirent */ + break; + } + } + if (not_this_dirent) { + /* reset filp */ + pos_len = strlen(filp); + filp -= prefix_len - pos_len; + /* next dirent */ + continue; + } + --filp; + pos_len = strlen(filp); + + /* Check for the final ".suffix" from r->filename */ + if (!strstr(dirent.name, filp)) { + filp -= prefix_len - pos_len; + continue; + } + filp -= prefix_len - pos_len; + + } else { + + /* Alternatively, given just "name", check for "name." + * Just like it used to be + */ + if (strncmp(dirent.name, filp, prefix_len)) { + continue; + } + if (dirent.name[prefix_len] != '.') { + continue; + } } + /* Yep. See if it's something which we have access to, and * which has a known type and encoding (as opposed to something