httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <>
Subject Re: [users@httpd] I18N, HTTP 2.0 ?
Date Sat, 18 Oct 2008 19:22:22 GMT
Comments inline; You painted this situation today with an overly broad brush,
there are some remaining issues but they are much narrower than you identify

André Warnier wrote:
> It is becoming urgent to create a new HTTP standard/version/revision,
> that would be organised around Unicode as a default character set, and
> UTF-8 as a default encoding.

The reason your search was futile is that you want to focus on searching
internet-draft where there are proposals in this sphere.  Also watch the
dependencies of the http draft, many of those have also evolved and are
beginning to solve the utf8 situation.

> Here are some areas where these problems appear :
> - the encoding of URLs.

That is not a problem.  URLs are essentially ASCII and the high order bit
byte domain is undefined.  So from a presentation perspective it can be
a problem, but technically and operationally this is not.  The only way
to represent URLs in the spirit of their design is to % encode the high
bit characters for presentation.  They can be UTF-8 or ISO-8859-1 (not
either-or, but the administrator's choice) and are easily typed in from
hardcopy (e.g. the tag on a TV commercial) by anyone using any character
set who has access to the ASCII subset.

Using "UTF-8" alone is not enough; to accept arbitrary characters is
to ignore the fact that there are multiple representations, often not
entirely synonymous, from visual references which are entered by the
user.  It's to ignore the issue of canonical forms when we are lucky
enough to have an astute reader.  So % encoding is the only safe data
entry format from the sensory world to the browser url bar.

> - the encoding of HTTP headers.

Headers?  I hope you mean header values.  *TEXT values clearly declare
how to shift to utf-8, but there's an ongoing discussion of how to fix
or broaden or clarify this on the http-wg list.

> - the encoding of user credentials in browser-side Basic and Digest
> authentication dialogs, and their transmission to the server.

Is a side effect of the HTTP headers question, and further it's a
UI design issue.

> - the encoding of input elements from html forms, as transmitted by a
> client to a server, and the interpretation of ditto data by the server

The RFC2616 http spec is clear on this and needs no further clarification.

7.2 Entity Body

   The entity-body (if any) sent with an HTTP request or response is in
   a format and encoding defined by the entity-header fields.

       entity-body    = *OCTET

   An entity-body is only present in a message when a message-body is
   present, as described in section 4.3. The entity-body is obtained
   from the message-body by decoding any Transfer-Encoding that might
   have been applied to ensure safe and proper transfer of the message.

7.2.1 Type

   When an entity-body is included with a message, the data type of that
   body is determined via the header fields Content-Type and Content-
   Encoding. These define a two-layer, ordered encoding model:

       entity-body := Content-Encoding( Content-Type( data ) )

And RFC2388 multipart/form-data spec is completely clear on this...

4.5 Charset of text in form data

   Each part of a multipart/form-data is supposed to have a content-
   type.  In the case where a field element is text, the charset
   parameter for the text indicates the character encoding used.

   For example, a form with a text field in which a user typed 'Joe owes
   <eu>100' where <eu> is the Euro symbol might have form data returned

    content-disposition: form-data; name="field1"
    content-type: text/plain;charset=windows-1250
    content-transfer-encoding: quoted-printable

    Joe owes =80100.

So what does the HTML spec have to say?  The <FORM > submission element
does include the accept-charset attribute, perhaps that is what you are
looking for?  Otherwise, if the user agents don't observe RFC 2388 then
you should really take that up with the user agent vendors.

The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:> for more info.
To unsubscribe, e-mail:
   "   from the digest:
For additional commands, e-mail:

View raw message