httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Hartill <r...@imdb.com>
Subject New changes (was This patch adds charset= handling for .var files) (fwd)
Date Thu, 02 May 1996 01:23:13 GMT

not acked


Message-Id: <199605010039.EAA01061@astral.msk.su>
Subject: New changes (was This patch adds charset= handling for .var files)
To: robh@imdb.com
Date: Wed, 1 May 1996 04:39:52 +0400 (MSD)
Cc: nms@nns.ru (Nickolay Saukh), apache-bugs@mail.apache.org (Apache Team)
In-Reply-To: <199604271758.LAA07882> from "Rob Hartill" at "Apr 27, 96 11:58:35 am"
From: =?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?= (aka Andrey A. Chernov, Black Mage)
<ache@astral.msk.su>
Return-Receipt-To: ache@astral.msk.su
X-Class: Fast
Precedence: special-delivery
X-Mailer: ELM [version 2.4ME+ PL15 (25)]
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary=ELM830911192-1041-0_
Content-Transfer-Encoding: 7bit


--ELM830911192-1041-0_
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

> 
> thanks, we'll take a look at your changes.
> 
> regards,
> rob

Now I improve my changes to reflect real life better. Today
only few clients specify Accept-Charset header field, most of
them don't bother to do it. In my new changes I add configured
possibility to guess valid charset(s) based on User-Agent field.
If Accept-Charset field is missing, any charset is acceptable
according to draft-ietf-http-v11-spec-02, it means that guessed
charsets is acceptable too. Guessed charset mechanism is flexible
enough and practically covers charset-per-operating-system case.
It is badly needed for Russians because each operating system
here has its own Russian charset and most of clients don't bother
to specify Accept-Charset header field at all. The patch related
to Apache 1.0.5 attached below, it includes my previous and
new changes.
Regards.

> >Choosing right document charset according to client Accept-Charset
> >header field is very actual problem for countries with several
> >active code tables (like Russian or Japan). .var files is perfect
> >place to handle such cases, so I add needed functionality.
> >This patch conforms draft-ietf-http-v11-spec-02 for Accept-Charset
> >syntax and draft-holtman-http-negotiation-00 for qc= parameter.
> >Please apply it or at least tell me any problems you find here.
> >Thanx in advance.

-- 
Andrey A. Chernov        : And I rest so composedly,  /Now, in my bed,
ache@astral.msk.su       : That any beholder  /Might fancy me dead -
http://dt.demos.su/~ache : Might start at beholding me,  /Thinking me dead.
RELCOM Team,FreeBSD Team :         E.A.Poe         From "For Annie" 1849

--ELM830911192-1041-0_
Content-Type: text/plain
Content-Disposition: attachment; filename=apache.patch
Content-Description: apache.patch
Content-Transfer-Encoding: 7bit

*** conf/srm.conf-dist.bak	Sat Feb 17 11:32:10 1996
--- conf/srm.conf-dist	Wed May  1 04:04:43 1996
***************
*** 109,114 ****
--- 109,121 ----
  
  LanguagePriority en fr de
  
+ # GuessCharset allows you to do charset guessing for clients which
+ # forget to specify Accept-Charset header field. Guessing based on
+ # User-Agent header field pattern.
+ # Format: GuessCharset user-agent_pattern accept-charset_value
+ # user-agent_pattern may contain '*' and '?' shell meta-characters
+ # Example: GuessCharset "Mozilla/* (X11;*" "koi8-r; q=0.8"
+ 
  # Redirect allows you to tell clients about documents which used to exist in
  # your server's namespace, but do not anymore. This allows you to tell the
  # clients where to look for the relocated document.
*** src/mod_negotiation.c.old	Wed May  1 04:09:49 1996
--- src/mod_negotiation.c	Wed May  1 04:10:05 1996
***************
*** 71,76 ****
--- 71,77 ----
  
  typedef struct {
      array_header *language_priority;
+     table *charset_patterns;    /* Added with GuessCharset... */
  } neg_dir_config;
  
  module negotiation_module;
***************
*** 81,86 ****
--- 82,88 ----
        (neg_dir_config *) palloc (p, sizeof (neg_dir_config));
  
      new->language_priority = make_array (p, 4, sizeof (char *));
+     new->charset_patterns = make_table (p, 4);
      return new;
  }
  
***************
*** 94,99 ****
--- 96,103 ----
      /* give priority to the config in the subdirectory */
      new->language_priority = append_arrays (p, add->language_priority,
  					    base->language_priority);
+     new->charset_patterns = overlay_tables (p, add->charset_patterns,
+ 					    base->charset_patterns);
      return new;
  }
  
***************
*** 114,119 ****
--- 118,129 ----
      return NULL;
  }
  
+ char *set_guess_charset (cmd_parms *cmd, neg_dir_config *m, char *pattern, char *charset)
+ {
+     table_set (m->charset_patterns, pattern, charset);
+     return NULL;
+ }
+ 
  int do_cache_negotiated_docs (server_rec *s)
  {
      return (get_module_config (s->module_config, &negotiation_module) != NULL);
***************
*** 124,129 ****
--- 134,140 ----
      NULL },
  { "LanguagePriority", set_language_priority, NULL, OR_FILEINFO, ITERATE,
      NULL },
+ { "GuessCharset", set_guess_charset, NULL, OR_FILEINFO, TAKE2, NULL },
  { NULL }
  };
  
***************
*** 139,145 ****
--- 150,158 ----
  
  typedef struct accept_rec {
      char *type_name;
+     char *charset;
      float quality;
+     float qc;
      float max_bytes;
      float level;
  } accept_rec;
***************
*** 168,175 ****
--- 181,190 ----
      char *file_name;
      char *content_encoding;
      char *content_language;
+     char *charset;
      float level;		/* Auxiliary to content-type... */
      float qs;
+     float qc;
      float bytes;
      int lang_index;
      int is_pseudo_html;		/* text/html, *or* the INCLUDES_MAGIC_TYPEs */
***************
*** 194,199 ****
--- 209,215 ----
      array_header *accepts;	/* accept_recs */
      array_header *accept_encodings;	/* accept_recs */
      array_header *accept_langs;	/* accept_recs */
+     array_header *accept_charsets; /* accept_recs */
      array_header *avail_vars;	/* available variants */
  } negotiation_state;
  
***************
*** 208,218 ****
--- 224,236 ----
      mime_info->file_name = "";
      mime_info->content_encoding = "";
      mime_info->content_language = "";
+     mime_info->charset = "";
  
      mime_info->is_pseudo_html = 0.0;
      mime_info->level = 0.0;
      mime_info->level_matched = 0.0;
      mime_info->qs = 0.0;
+     mime_info->qc = 0.0;
      mime_info->quality = 0.0;
      mime_info->bytes = 0;
      mime_info->lang_index = -1;
***************
*** 225,231 ****
--- 243,251 ----
  void set_mime_fields (var_rec *var, accept_rec *mime_info)
  {
      var->type_name = mime_info->type_name;
+     var->charset = mime_info->charset;
      var->qs = mime_info->quality;
+     var->qc = mime_info->qc;
      var->quality = mime_info->quality; /* Initial quality is just qs */
      var->level = mime_info->level;
  
***************
*** 295,301 ****
--- 315,323 ----
  char *get_entry (pool *p, accept_rec *result, char *accept_line)
  {
      result->quality = 1.0;
+     result->qc = 1.0;
      result->max_bytes = 0.0;
+     result->charset = "";
      result->level = 0.0;
      
      /* Note that this handles what I gather is the "old format",
***************
*** 351,361 ****
--- 373,393 ----
  	if (parm[0] == 'q'
  	    && (parm[1] == '\0' || (parm[1] == 's' && parm[2] == '\0')))
  	    result->quality = atof(cp);
+ 	else if (parm[0] == 'q' && parm[1] == 'c' && parm[2] == '\0')
+ 	    result->qc = atof(cp);
  	else if (parm[0] == 'm' && parm[1] == 'x' &&
  		 parm[2] == 'b' && parm[3] == '\0')
  	    result->max_bytes = atof(cp);
  	else if (parm[0] == 'l' && !strcmp (&parm[1], "evel"))
  	    result->level = atof(cp);
+ 	else if (parm[0] == 'c' && !strcmp (&parm[1], "harset")) {
+ 	    result->charset = cp;
+ 	    if ((cp = strchr (result->charset, '\n')) != NULL)
+ 		*cp = '\0';
+ 	    if ((cp = strrchr (result->charset, '"')) != NULL)
+ 		*cp = '\0';
+ 	    str_tolower (result->charset);
+ 	}
      }
  
      if (*accept_line == ',') ++accept_line;
***************
*** 388,393 ****
--- 420,453 ----
   * Handling header lines from clients...
   */
  
+ char *get_accept_charset (request_rec *r)
+ {
+     char *s;
+ 
+     if ((s = table_get (r->headers_in, "Accept-charset")) == NULL) {
+ 	neg_dir_config *conf =
+ 	     (neg_dir_config *) get_module_config (r->per_dir_config,
+ 						   &negotiation_module);
+ 	if (conf != NULL) {
+ 	    char *agent = table_get (r->headers_in, "User-Agent");
+ 
+ 	    if (agent != NULL) {
+ 		table *t = conf->charset_patterns;
+ 		table_entry *elts = (table_entry *)t->elts;
+ 		int i;
+ 
+ 		for (i = 0; i < t->nelts; ++i) {
+ 		    if (!strcmp_match (agent, elts[i].key)) {
+ 			s = elts[i].val;
+ 			break;
+ 		    }
+ 		}
+ 	    }
+ 	}
+     }
+     return s;
+ }
+ 
  negotiation_state *parse_accept_headers (request_rec *r)
  {
      negotiation_state *new =
***************
*** 403,408 ****
--- 463,470 ----
        do_header_line (r->pool, table_get (hdrs, "Accept-encoding"));
      new->accept_langs =
        do_header_line (r->pool, table_get (hdrs, "Accept-language"));
+     new->accept_charsets =
+       do_header_line (r->pool, get_accept_charset (r));
      new->avail_vars = make_array (r->pool, 40, sizeof (var_rec));
  
      return new;
***************
*** 421,428 ****
--- 483,492 ----
    
      new_accept->type_name = CGI_MAGIC_TYPE;
      new_accept->quality = prefer_scripts ? 1e-20 : 1e20;
+     new_accept->qc = 1.0;
      new_accept->level = 0.0;
      new_accept->max_bytes = 0.0;
+     new_accept->charset = "";
  
      if (neg->accepts->nelts > 1) return;
      
***************
*** 430,437 ****
--- 494,503 ----
      
      new_accept->type_name = "*/*";
      new_accept->quality = 1.0;
+     new_accept->qc = 1.0;
      new_accept->level = 0.0;
      new_accept->max_bytes = 0.0;
+     new_accept->charset = "";
  }
  
  /*****************************************************************
***************
*** 735,740 ****
--- 801,833 ----
      return OK;
  }
  
+ float charset_quality (negotiation_state *neg, var_rec *avail)
+ {
+     accept_rec *accs;
+     char *charset;
+     int i;
+ 
+     /* If no Accept-Charset is present, everything is acceptable */
+ 
+     if (!neg->accept_charsets->nelts)
+ 	return 1.0;
+ 
+     charset = avail->charset;
+     if (!*charset)
+ 	charset = "iso-8859-1"; /* default */
+ 
+     accs = (accept_rec *)neg->accept_charsets->elts;
+ 
+     for (i = 0; i < neg->accept_charsets->nelts; ++i)
+ 	if (!strcmp (charset, accs[i].type_name))
+ 	    return accs[i].quality;
+ 
+     if (!strcmp (charset, "iso-8859-1"))
+ 	return 1.0;
+ 	    
+     return 0.0;
+ }
+ 
  /* This code implements a piece of the tie-breaking algorithm between
   * variants of equal quality.  This piece is the treatment of variants
   * of the same base media type, but different levels.  What we want to
***************
*** 966,977 ****
  	for (j = 0; j < neg->avail_vars->nelts; ++j) {
  	    
  	    var_rec *variant = &avail_recs[j];
! 	    float q = type->quality * variant->quality;
  		
  	    /* If we've already rejected this variant, don't waste time */
  	    
  	    if (q == 0.0) continue;	
  	    
  	    /* If media types don't match, forget it.
  	     * (This includes the level check).
  	     */
--- 1059,1074 ----
  	for (j = 0; j < neg->avail_vars->nelts; ++j) {
  	    
  	    var_rec *variant = &avail_recs[j];
! 	    float q = type->quality * variant->quality * variant->qc;
  		
  	    /* If we've already rejected this variant, don't waste time */
  	    
  	    if (q == 0.0) continue;	
  	    
+ 	    q *= charset_quality(neg, variant);
+ 
+ 	    if (q == 0.0) continue;	
+ 
  	    /* If media types don't match, forget it.
  	     * (This includes the level check).
  	     */
*** src/util_script.c.orig	Sat Feb 17 11:32:10 1996
--- src/util_script.c	Wed May  1 03:42:46 1996
***************
*** 60,65 ****
--- 60,67 ----
  #include "http_core.h"		/* For document_root.  Sigh... */
  #include "util_script.h"
  
+ extern char *get_accept_charset (request_rec *r);
+ 
  /*
   * Various utility functions which are common to a whole lot of
   * script-type extensions mechanisms, and might as well be gathered
***************
*** 125,131 ****
      server_rec *s = r->server;
      conn_rec *c = r->connection;
      
!     char port[40],*env_path;
      
      array_header *hdrs_arr = table_elts (r->headers_in);
      table_entry *hdrs = (table_entry *)hdrs_arr->elts;
--- 127,133 ----
      server_rec *s = r->server;
      conn_rec *c = r->connection;
      
!     char port[40],*env_path,*accept_charset;
      
      array_header *hdrs_arr = table_elts (r->headers_in);
      table_entry *hdrs = (table_entry *)hdrs_arr->elts;
***************
*** 150,158 ****
--- 152,164 ----
  	    table_set (e, "CONTENT_LENGTH", hdrs[i].val);
  	else if (!strcasecmp (hdrs[i].key, "Authorization"))
  	    continue;
+ 	else if (!strcasecmp (hdrs[i].key, "Accept-charset"))
+ 	    continue;   /* do it later */
  	else
  	    table_set (e, http2env (r->pool, hdrs[i].key), hdrs[i].val);
      }
+     if ((accept_charset = get_accept_charset (r)) != NULL)
+ 	table_set (e, "HTTP_ACCEPT_CHARSET", accept_charset);
      
      sprintf(port, "%d", s->port);
  

--ELM830911192-1041-0_--

----- End of forwarded message from =?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?= -----

Mime
View raw message