Hi there,
for your consideration, appended to this mail is a patch to remove
the requirements on the order of suffixes when using MultiViews /
mod_negotiation. This corresponds to (part of) PR3430.
The patch is relative to the version of mod_negotiation.c distributed
with apache-2.0.15. I believe that's identical to the one from
apache-2.0.16.
But first, some notes:
The current method takes the "file" part of r->filename (either the bit
after the final / in the URL, or the value of DirectoryIndex). First,
if the exact filename matches, mod_negotiation declines to handle it.
Second, for each file in the directory, it checks for (regex syntax)
/^file\./, and only considers ones that match.
This patched method, for the requested file "file" does the same thing
after a single extra if(strchr()).
However, if the r->filename is actually "file.s1.s2.sZ" (with dots), the
current way looks for /^file\.s1\.s2\.sZ\./; the patched way looks for
each of /^file\./, /\.s1/, /\.s2/, /\.sZ/. It bails out at the first
failure.
In this case, the patched code does an extra strchr, strlen, strstr,
some pointer arithmetic and pointer movement, changes a character, and
changes it back. Per dot in r->filename, per file in the directory. I
don't have numbers for how expensive that extra string manipulation is.
Some consequences of this implementation are:
Current method: file "name.en.html" is only accessible through (partial)
URIs "name", "name.en", or "name.en.html"
Patched method: The same three work, as do "name.html" and
"name.html.en". That's good. However: so do "name.htm", "name.htm.en",
and "name.en.htm". That may or may not be considered good. More however:
so do "name.h", "name.h.h", "name...h.e.e..e.h.h.", and an infinite
number of similar variations. That may not be considered good.
[ side note -- most of that infinitude could be eliminated, if desired,
by (for example) checking that the length of r->filename (prefix_len, in
the code) is not more than the length of dirent.name, immediately before
the while loop which looks for dots in filp. I would consider that an
enhancement to, rather than an integral part of, the patch, so didn't
include it here. Opinions may differ ]
In each case, the content is returned with a Content-Location: header
indicating the canonical filename.
The requirements are (1)r->filename up to the first dot must match the
real filename up to the first dot; (2)each .suffix in r->filename must
exist (string match) in the real filename; (3)the real filename must
correspond to a known mime-type, encoding, etc -- which I think means
that the final suffix must be known, and only suffixes followed by known
suffixes are considered.
As a real example, testing with the apache "It worked!" page (named
index.html.LANG), if I request index.html.fr, I get the page back.
If I request index.fr.html, or just index.fr, I get back the 406 Not
Acceptable page, with a link to index.html.fr, _unless_ I include fr
as an acceptable language. That's PR6282, which is mentioned but not
addressed in this patch. If I include fr as a language, I can request
/index.fr, /index.fr.html, or /index.html.fr successfully. If I include
fr as my preferred language, I can additionally request /, /index, and
/index.html. (As well as the .h, .ht, .htm, .f variants referred to
earlier). If I request /index.d, I get a 406 with links to index.html.de
and index.html.dk
As a faked example, consider five files in the DocumentRoot, with no
special customisations to the (MIME) configuration:
files a.b.c, d.e.html, g.h.i.j.k.en, m.n.o.p.q.html, s.t.html.u.v
The following requests have the indicated results:
GET /a -> not found
GET /a.b -> not found
GET /a.c -> not found
GET /a.b.c -> success
GET /d -> success
GET /d.e -> success
GET /d.h -> success
GET /d.html -> success
GET /d....html -> success
GET /g -> not found
GET /g.h -> not found
GET /g.h.i.j.k -> not found
GET /g.h.i.j.k.en -> success
GET /g.h.i.k.j.en -> not found
GET /m -> success
GET /m.html -> success
GET /m.o.q.p.n -> success
GET /m.o.r.p.n -> not found
GET /s.t.html.u.v -> success
GET /s -> not found
GET /s.t.html.u -> not found
note that in the "not found" cases there (except for /m.o.r.p.n), the
patched code does pass the file down as being potentially valid --
it's later code which decides that it doesn't know how to treat the
final suffix, and fails it.
As another faked example, with files ..d.f.html and .e.txt, I can
successfully issue GETs for /.d, /.f, /.h, /.e and /.t, as well as
things like /....t. (whether or not the final . there is punctuation).
So that's it. Any comments, criticism, or ridicule related to the
patch, please send my way, or to the list.
All the best,
f
--
Francis Daly deva@daoine.org
--- modules/mappers/mod_negotiation.c.orig Wed Apr 4 18:59:20 2001
+++ modules/mappers/mod_negotiation.c Thu Apr 5 20:51:13 2001
@@ -911,6 +911,9 @@
struct var_rec mime_info;
struct accept_rec accept_info;
void *new_var;
+ char *pos;
+ int pos_len;
+ int not_this_dirent;
clean_var_rec(&mime_info);
@@ -935,12 +938,76 @@
request_rec *sub_req;
/* Do we have a match? */
- if (strncmp(dirent.name, filp, prefix_len)) {
- continue;
- }
- if (dirent.name[prefix_len] != '.') {
- continue;
+
+ if ((pos = strchr(filp, '.'))) {
+
+ /* Given "name.suf1.suf2.suffix", check for "name." */
+
+ pos_len = pos - filp + 1;
+ if (strncmp(dirent.name, filp, pos_len)) {
+ continue;
+ }
+
+ not_this_dirent = 0;
+ filp = ++pos;
+
+ /* Check for each internal ".sufN" from r->filename */
+ while ((pos = strchr(filp, '.'))) {
+ --filp;
+ pos_len = pos - filp ;
+ filp[pos_len]='\0';
+ if (!strstr(dirent.name, filp)) {
+ not_this_dirent=1;
+ }
+
+ /* XXX: Right now, filp points to a suffix (encoding indicator,
+ * handler indicator, mime-type indicator, whatever), starting
+ * with a ".". If we want to do stuff, like consider that to be
+ * an implicit additional Accept: header, here would be a good
+ * place to do it. See PR6282 for an example of what I mean.
+ * Note, this would have to be repeated once more, just after the
+ * check for the final ".suffix" and before filp gets moved back
+ * again.
+ */
+
+ filp[pos_len] = '.';
+ filp += pos_len + 1;
+
+ if (not_this_dirent) {
+ /* get to next dirent */
+ break;
+ }
+ }
+ if (not_this_dirent) {
+ /* reset filp */
+ pos_len = strlen(filp);
+ filp -= prefix_len - pos_len;
+ /* next dirent */
+ continue;
+ }
+ --filp;
+ pos_len = strlen(filp);
+
+ /* Check for the final ".suffix" from r->filename */
+ if (!strstr(dirent.name, filp)) {
+ filp -= prefix_len - pos_len;
+ continue;
+ }
+ filp -= prefix_len - pos_len;
+
+ } else {
+
+ /* Alternatively, given just "name", check for "name."
+ * Just like it used to be
+ */
+ if (strncmp(dirent.name, filp, prefix_len)) {
+ continue;
+ }
+ if (dirent.name[prefix_len] != '.') {
+ continue;
+ }
}
+
/* Yep. See if it's something which we have access to, and
* which has a known type and encoding (as opposed to something
|