httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From (Florent Guillaume)
Subject Re: Patches to handle content-language
Date Tue, 18 Jul 1995 00:07:55 GMT
> I really like this, but what resolves name collisions and missing 
> info between type, lang, and encoding?  For example, if I decide to name 
> all my Framemaker documents .fr, what happens to 
>  If type, lang, and encoding shared the 
> same namespace, *no* problem.  In this case, we're using filename 
> extensions to indicate meta-information other than content-type, which 
> I'm certainly comfortable with, but the collision issue should be 
> resolved somehow.
> Also, it would be tremendous if I could have the flexibility to negotiate 
> on file type and language and encoding by specifying only the meta-info I 
> want in the filename - in other words, lets say I have documents in all 
> the possible variations of
> basename.[html,txt,pdf].[en,fr,jp].[gz,Z,uu]
> Right now with content-negotiation, if I have an index.html and an 
> index.html3, then I can simply point a resource locator to "index" and 
> negotiation happens, but I can also defeat negotiation by explicitly 
> linking to "index.html3" if I wanted to make sure someone got the 3.0 
> version.  
> Let's say for the above 9 versions of the document I wanted to 
> be able to specify which variables are mandatory.  If I didn't care at 
> all which document was fetched, I'd create a link to "basename".  If I 
> wanted specifically the gzip'd french PDF, I'd make a link to 
> "".  Now, let's say I want to make a link to all 
> french variants explicit, yet let the client/server negotiate on their 
> own as to encoding and content-type preferences.  I'd like to then link 
> to "".  Or, I specifically want the uuencoded PDF's, but I 
> don't care what language: "basename.pdf.gz".  
> Thoughts?  If we ensure there's no namespace collisions between mime 
> type extensions and filename extensions and encoding extension then this 
> is easy.  If not....

Concerning namespace collision : it's certainly a problem.
Currently the behaviour I have is :	-> french	-> french, framemaker	-> english, framemaker
This is because the code starts with the last suffix and moves to the
left, looking for an encoding, then a language, then a type.

What should be done ?  Forbidding any namespace collision would be a bit
exaggerated, because (as you showed) it can very well happen that a
content-type is also an abbreviation for a language.  So I think the
content-type should be given priority over the content-language,
somehow (more later).

Now on the topic of missing info for the negotiation.  What you're
describing is exactly the behaviour of the CERN server :

Supposing you ask for "basename.pdf.gz", CERN first extracts the
basename of the requested file, "basename", and the associated suffixes,
"pdf" and "gz" (the ordering of suffixes isn't important for CERN.  Then
it looks in the requested directory for all files with the same
"basename", and for each one analyses the suffixes.  It eliminates all
the files in which the requested suffixes are not present, and is left
in our case with "basename.pdf.en.gz", "", and
"".  The prioritizing between these three is made by
the usual quality assessment.  But before this last quality assessment,
the suffixes had no meaning, so the files "" and
"" are confused by the server.  It's unclear to me
how we could make the distinction : keep all suffixes unknown to the
server in the "basename" part ?

Instead of this, the current behaviour of Apache in MultiViews is to
look for files that have the same beginning as the requested filename
(not simply the basename) with additionnal suffixes, and to do typing of
suffixes in a fixed order.

Doing things a la CERN is not difficult (and I'll probably write a patch
for this), but it should be a little bit slower than what we have now.
This may not be a problem if the majority of files are accessed by exact
name, or if the directories have a small number of files.  Also do we
keep the fact that the order of suffixes is unimportant
(i.e. file.txt.gz and file.gz.txt both work with CERN) ?  I have mixed
feelings : I like both and, but I think the
encoding should come last (this reflects the way gzip and compress

Note that the problem of namespace collision still exists with the CERN
behaviour : suppose the files you have are all 27 (not 9, BTW) variations of


Then how do we treat a request for "" ?  If we arrange to
have content-type > content-language, then this is a request for the
Framemaker version, in any language.  But then how do we ask for a
gziped French version of the document in any type available ?  It can be
done using an Accept-Language: fr header, and requesting "basename.gz",
but this is cheating, we want something in the URL only.



View raw message