Date: Thu, 29 Aug 1996 11:22:28 +0100 (BST)
From: Paul Sutton <paul@ukweb.com>
To: new-httpd@hyperreal.com
Subject: Re: Negotiation updates part II
In-Reply-To: <Pine.SOL.3.93.960827204424.27565A-100000@eat.organic.com>
Message-ID: <Pine.LNX.3.92.960828162910.239A-100000@star>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-new-httpd@apache.org
Precedence: bulk
Reply-To: new-httpd@hyperreal.com

On Tue, 27 Aug 1996, Alexei Kosut wrote:
> I would perfer that HOLTMAN be undefined by default. It's very much an
> experimental protocol, and I really don't want to ship a server with it
> enabled, since it is likely (and had better) change.

Sure. It might even be worth taking out all the holtman specific code,
since the draft is so wobbly.

> And there's still the bug I mentioned earlier: if you have two variants,
> foo.txt and foo.txt.gz (or foo.txt and foo.txt.en), it does not set "Vary:
> accept-encoding" (or accept-language). If you have foo.txt.gz and
> foo.txt.Z, it does.

Ok, this worked with explicit var maps, but not when built from a
directory read (the former set any 'unset' encodings and languages to "",
while the latter used NULL). Fixed.

> If you have foo.txt and foo.txt.en, it will always send foo.txt, even if
> you have "Accept-Langauge: en".

I don't really like the idea of having some variants with languages, and
others without. This implies that the variants vary on language, but what
language do we consider foo.txt to have? For each of the other dimensions,
a variant with no value in that dimension is ok (no content-type means use
the server DefaultType, no charset means use ISO-8859-1, no encoding
means, er, no encoding). Language is special because there is no default.

Actually, I see that the Content-Language definition says that an entity
with no Content-Language header (which is the case with foo.txt) is
acceptable to all languages.

So perhaps the following would do:

  If no variants have an explicit content_language, leave
  all the language qualities at 1.0

  If any variant has a content_language, give the variants with
  no content_language a language quality of 0.0001

The effect of this would be that variants with no language would be
considered after variants with explicit languages, unless no variants have
languages in which case all are equally acceptable.

The patch does this, but I'm not entirely sure it is the right thing to
do.

>                                        The opposite problem exists with
> encodings. The lack of Accept-Encoding should indicate preference for no
> encoding (although if only encoded variants are available, they should
> still be sent).

For encodings, no Accept-Encoding header means that all are acceptable,
while an _empty_ header indicates that none are acceptable. Perhaps we
should check the proto_num and assume that no Accept-Encoding header means
none are acceptible for pre-1.1 requests?

So for 1.1 requests at least, having no Accept-Encoding header means all
encodings are acceptable, and all other things being equal, Apache will
usually send an encoded variant if possible, because of the final
content-length check. We could override this in the negotiation, but sites
might prefer to send encoded entities, because of bandwidth issues. The
patch puts the code to prefer this in a #define PREFER_NO_ENCODING since
I'm not sure it ought to be the default.

> And there's still the en-US/en vs. en/en-US thing. Roy seems to agree that
> if the client sends en-US Apache should grudgingly perfer en over
> langauges that don't appear at all in the Accept-Language header, and this
> would be a good way of dealing with it, I think.

Well, I think Roy actaully suggested that if Accept-Langauge: en-US was
received, then Apache should assume en; q=0.5. This might not be the best
idea, since the en-US might have a lower q value itself, or their might be
other languages with q's less than 0.5. I would prefer to give the
'assumed' en a q of 0.0001 to guarantee that any explicitly listed
languages with any value q are going to be preferred. This is set in the
patch.

These three changes are in the patch below. This patches the
mod_negotiation created by the mod_negotiation.patch2 patch I previously
uploaded to hyperreal. This only affects mod_negotiation.c. If you would
prefer a patch against virgin 1.2-dev, I could upload that instead.

Paul
UK Web Ltd

*** mod_negotiation.c.patch2	Wed Aug 28 12:22:07 1996
--- mod_negotiation.c	Thu Aug 29 11:16:57 1996
***************
*** 69,75 ****
   * This file currently implements the draft-02, except for
   * anything to do with features and cache-control (max-age etc)
   */
! #define HOLTMAN

  /* Commands --- configuring document caching on a per (virtual?)
   * server basis...
--- 69,75 ----
   * This file currently implements the draft-02, except for
   * anything to do with features and cache-control (max-age etc)
   */
! /*#define HOLTMAN*/

  /* Commands --- configuring document caching on a per (virtual?)
   * server basis...
***************
*** 203,209 ****
      request_rec *r;
      char *dir_name;
      int accept_q;		/* 1 if an Accept item has a q= param */
!
      array_header *accepts;      /* accept_recs */
      int have_accept_header;	/* 1 if Accept-Header present */
      array_header *accept_encodings; /* accept_recs */
--- 203,211 ----
      request_rec *r;
      char *dir_name;
      int accept_q;		/* 1 if an Accept item has a q= param */
!     float default_lang_quality;	/* fiddle lang q for variants with no lang */
!
!
      array_header *accepts;      /* accept_recs */
      int have_accept_header;	/* 1 if Accept-Header present */
      array_header *accept_encodings; /* accept_recs */
***************
*** 749,758 ****

  	mime_info.sub_req = sub_req;
  	mime_info.file_name = pstrdup(neg->pool, dir_entry->d_name);
! 	if (mime_info.content_encoding = sub_req->content_encoding)
  	    str_tolower(mime_info.content_encoding);
! 	if (mime_info.content_language = sub_req->content_language)
  	    str_tolower(mime_info.content_language);

  	get_entry (neg->pool, &accept_info, sub_req->content_type);
  	set_mime_fields (&mime_info, &accept_info);
--- 751,764 ----

  	mime_info.sub_req = sub_req;
  	mime_info.file_name = pstrdup(neg->pool, dir_entry->d_name);
! 	if (sub_req->content_encoding) {
! 	    mime_info.content_encoding = sub_req->content_encoding;
  	    str_tolower(mime_info.content_encoding);
! 	}
! 	if (sub_req->content_language) {
! 	    mime_info.content_language = sub_req->content_language;
  	    str_tolower(mime_info.content_language);
+ 	}

  	get_entry (neg->pool, &accept_info, sub_req->content_type);
  	set_mime_fields (&mime_info, &accept_info);
***************
*** 928,933 ****
--- 934,967 ----
      return -1;
  }

+ /* set_default_lang_quality() sets the quality we apply to variants
+  * which have no language assigned to them. If none of the variants
+  * have a language, we are not negotiating on language, so all are
+  * acceptable, and we set the default q value to 1.0. However if
+  * some of the variants have languages, we set this default to 0.0001.
+  * The value of this default will be applied to all variants with
+  * no explicit language -- which will have the effect of making them
+  * acceptable, but only if no variants with an explicit language
+  * are acceptable. The default q value set here is assigned to variants
+  * with no language type in set_language_quality().
+  */
+
+ void set_default_lang_quality(negotiation_state *neg)
+ {
+     var_rec *avail_recs = (var_rec *)neg->avail_vars->elts;
+     int j;
+
+     for (j = 0; j < neg->avail_vars->nelts; ++j) {
+         var_rec *variant = &avail_recs[j];
+ 	if (variant->content_language && *variant->content_language) {
+ 	    neg->default_lang_quality = 0.0001;
+ 	    return;
+ 	}
+     }
+
+     neg->default_lang_quality = 1.0;
+ }
+
  /* Set the language_quality value in the variant record. Also
   * assigns lang_index for back-compat.
   *
***************
*** 938,943 ****
--- 972,990 ----
   * or as far as the prefix marker (-). If two or more languages
   * match, use the longest string from the Accept-Language: header
   * (see HTTP/1.1 [14.4])
+  *
+  * If the variant has no language and we have no Accept-Language
+  * items, leave the quality at 1.0 and return.
+  *
+  * If the variant has no language, we use the default as set by
+  * set_default_lang_quality() (1.0 if we are not negotiating on
+  * language, 0.0001 if we are).
+  *
+  * Following the setting of the language quality, we drop through to
+  * set the old 'lang_index'. This is set based on either the order
+  * of the languages on the Accept-Language header, or the
+  * order on the LanguagePriority directive. This is only used
+  * in the negotiation if the language qualities tie.
   */

  void set_language_quality(negotiation_state *neg, var_rec *variant)
***************
*** 957,970 ****
          conf = (neg_dir_config *) get_module_config (neg->r->per_dir_config,
                                                       &negotiation_module);

!     if (!lang || !*lang)
          return;                 /* variant has no assigned language */

      p = strchr(lang, '-');      /* find prefix part (if any) */
      if (p)
!         prefixlen = p - lang;

-     if (naccept) {
          accs = (accept_rec *)neg->accept_langs->elts;

          for (i = 0; i < neg->accept_langs->nelts; ++i) {
--- 1004,1025 ----
          conf = (neg_dir_config *) get_module_config (neg->r->per_dir_config,
                                                       &negotiation_module);

!     if (naccept == 0 && (!lang || !*lang))
          return;                 /* variant has no assigned language */

      p = strchr(lang, '-');      /* find prefix part (if any) */
      if (p)
!         prefixlen = p - lang;
!
!     if (!lang || !*lang) {
!         /* This variant has no content-language, so use the default
! 	 * quality factor for variants with no content-language
! 	 * (previously set by set_default_lang_quality()). */
!         variant->lang_quality = neg->default_lang_quality;
!     }
!     else if (naccept) {
! 	float fiddle_q = 0.0;

          accs = (accept_rec *)neg->accept_langs->elts;

          for (i = 0; i < neg->accept_langs->nelts; ++i) {
***************
*** 973,992 ****
                  continue;
              }

!             /* Find language */
              if ((!strcmp (lang, accs[i].type_name) ||
                   (prefixlen &&
!                   !strncmp(lang, accs[i].type_name, prefixlen))) &&
                  ((len = strlen(accs[i].type_name)) >
                                       longest_lang_range_len)) {
                  longest_lang_range_len = len;
                  best = &accs[i];
              }

          }
!
          variant->lang_quality = best ? best->quality :
!                                 (star ? star->quality : 0.0);
      }

      /* Now set the old lang_index field */
--- 1028,1073 ----
                  continue;
              }

!             /* Find language. We match if either the variant language
! 	     * tag exactly matches, or the prefix of the tag up to the
! 	     * '-' character matches the whole of the language in the
! 	     * Accept-Language header */
              if ((!strcmp (lang, accs[i].type_name) ||
                   (prefixlen &&
!                   !strncmp(lang, accs[i].type_name, prefixlen) &&
! 		  (accs[i].type_name[prefixlen] == '\0'))) &&
                  ((len = strlen(accs[i].type_name)) >
                                       longest_lang_range_len)) {
                  longest_lang_range_len = len;
                  best = &accs[i];
              }
+
+ 	    if (! best) {
+ 	        /* The next bit is a fiddle. Some browsers might be
+ 		 * configured to send more specific language ranges
+ 		 * than desirable. For example, an Accept-Language of
+ 		 * en-US should never match variants with languages en
+ 		 * or en-GB. But US English speakers might pick en-US
+ 		 * as their language choice.  So this fiddle checks if
+ 		 * the language range has a prefix, and if so, it
+ 		 * matches variants which match that prefix with a
+ 		 * priority of 0.0001. So a request for en-US would
+ 		 * match variants of types en and en-GB, but at much
+ 		 * lower priority than matches of en-US directly, or
+ 		 * of any other language listed on the Accept-Language
+ 		 * header
+ 		 */
+ 	        if (p = strchr(accs[i].type_name, '-')) {
+ 		    int plen = p - accs[i].type_name;
+ 		    if (!strncmp(lang, accs[i].type_name, plen))
+ 			fiddle_q = 0.0001;
+ 		}
+ 	    }

          }
!
          variant->lang_quality = best ? best->quality :
! 	                     (star ? star->quality : fiddle_q);
      }

      /* Now set the old lang_index field */
***************
*** 1324,1331 ****
      }

      /* encoding -- can only be 1 or 0, and if 0 we eliminated this
!      * variant at the start of this function */

      /* charset */
      if (variant->charset_quality < best->charset_quality)
          return 0;
--- 1405,1424 ----
      }

      /* encoding -- can only be 1 or 0, and if 0 we eliminated this
!      * variant at the start of this function. However we might
!      * prefer variants with no encoding over those with encoding
!      * If so, define in the following code.  */
! /*#define PREFER_NO_ENCODING*/
! #ifdef PREFER_NO_ENCODING
!     if (!*best->content_encoding && *variant->content_encoding)
! 	return 0;
!     if (*best->content_encoding && !*variant->content_encoding) {
! 	*p_bestq = q;
! 	return 1;
!     }
! #endif

+
      /* charset */
      if (variant->charset_quality < best->charset_quality)
          return 0;
***************
*** 1355,1360 ****
--- 1448,1455 ----
      enum algorithm_results algorithm_result = na_not_applied;

      var_rec *avail_recs = (var_rec *)neg->avail_vars->elts;
+
+     set_default_lang_quality(neg);

      /*
       * Find the 'best' variant