tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject I18N, HTTP 2.0 ?
Date Sat, 18 Oct 2008 17:02:17 GMT

I am sending this to both the Apache httpd and Tomcat users lists, in 
the hope that because together these HTTP servers cover a good fraction 
of the market, there might be a chance to reach the righ people.

My hope is that someone who is aware of, and connected to, the process 
of RFC generation would pick this up, or else inform us if some process 
in the direction that I am indicating below is already under way.

I apologise in advance if I am crashing an open door.  If so, I would 
gladly accept to be informed about what the state of affairs is.
(A Google search on the terms "HTTP" and "RFC" and "UTF-8" does not seem 
to yeld any relevant results.)

Proposal :

It is becoming urgent to create a new HTTP standard/version/revision, 
that would be organised around Unicode as a default character set, and 
UTF-8 as a default encoding.

I believe that the spread and acceptance of Unicode and UTF-8 is now 
sufficient to warrant such an evolution.

The current situation, where iso-8859-1 is the default in some areas, 
and  some other areas are either unspecified or vague, creates a lot of 
confusion and inefficiencies, and creates barriers to the creation of 
truly international HTTP-based WWW applications.

Here are some areas where these problems appear :
- the encoding of URLs.
- the encoding of HTTP headers.
- the encoding of user credentials in browser-side Basic and Digest 
authentication dialogs, and their transmission to the server.
- the encoding of input elements from html forms, as transmitted by a 
client to a server, and the interpretation of ditto data by the server

I am quite sure that I am forgetting some aspects of the same issue.

For each of the above, there are areas where there is no specification, 
or areas where there are vague specifications, or areas where there are 
multiple apparently-contradictory specifications.
Consequently, there is a profusion of ad-hoc tricks and receipes, and 
there start to appear various "parameters" and "flags" and "settings" at 
the client and server level, which may help resolving the issues in some 
cases, but which in the long term create even more confusion and 
problems of interoperability.
(example of a setting : "use body encoding for URL").

There might be some efforts under way to tackle one or the other aspect 
of the above (I have heard of a proposal regarding HTTP headers), but I 
honestly believe that this issue can only be resolved well "at the top", 
which seems to me the HTTP protocol itself.


To start a new topic, e-mail:
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message