tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: [OT] cookie issue with Tomcat 7 - does not accept the character "é"
Date Tue, 04 Feb 2014 10:59:35 GMT
Mark Thomas wrote:
> Cookie handling is fundamentally a complete mess. Specifications exist
> but are not fully implemented, are not consistent with related
> specifications, etc.
> Having tried to sort this out the last time around and having read
> Jeremy's great work on documenting where we stand at the present moment,
> it often feels like it wouldn't be too hard to make a case that just
> about any cookie name or value that isn't an token (as per RFC2616) is
> either valid or invalid depending on which specification(s) you choose
> to read.
> I'd strongly encourage anyone thinking about commenting further on this
> thread to take the time to read the wiki page [1] where the Tomcat
> committers (and Jeremy in particular) are currently trying to figure out
> exactly how Tomcat should handle cookies in the future.
> Mark
> [1]

Hi agree whith everything you say above.

About the Wiki, what seems to be missing is additional lines in the tables showing some 
examples of cookie values containing what English-speaking people often call "additional"

or "accented" characters (and what other people just call "characters").  For example, 
what happens when the cookie value is a string like "ÄÖÜäöüéèîôâ" (that's about
the extent 
of what I can enter easily on this current German keyboard).

And let's also reflect on the fact that no matter what else we have been discussing here,

we have still not provided the original OP of this thread with any useful and practical 
recommendation to resolve his problem, which seems to originate in a variation between how

Tomcat 6 and Tomcat 7 handle cookies with "accented characters" in their value.

Otherwise, to generalise the debate, it is not just cookies, but just about anything which

has to do with non-US-ASCII characters under HTTP and HTML which is a mess, and has been a

mess for several years if not decades.  The current jumble of RFCs that deal with this 
issue is in the end more confusing than helpful.  And all the current "solutions" in terms

of implementation (browser-side as well as server-side) resemble patches over patches over

wooden legs.

I am not saying that resolving the issue is simple, nor that one can simply ignore the 
past and/or backward-compatibility issues.  But, despite the immense respect I have for 
people like Roy Fielding and their achievements, I cannot but slowly get the impression 
that the Internet RFC mechanism is, in that respect, slowly getting "fossilised", and that

nobody seems to have the energy and drive anymore to think radically, and tackle the issue

from the top down.

Nobody nowadays discusses anymore that Unicode and UTF-8 provide a form of "universal" 
solution to most of the issues in terms of alphabets, character sets and encodings 
suitable for 99% of the human users of computers and of the Internet.  And nobody 
discusses anymore that 99% of currently in-use hardware and software can handle arbitrary

sequences of bytes and bits perfectly fine.

Yet in terms of programming "for the Internet", we still have to live with - and work 
around every day - a set of standards and recommendations based on a myriad of alphabets 
and encodings which can each properly represent only a tiny fraction of the languages that

people worldwide speak and read.
And the issues related to encoding/decoding/transliterating between these different 
alphabets and encodings, are costing thousands of productive hours lost every day, 
independently of the confusions and aggravations that they generate.

Why is it exactly that we can come up with things like websockets and HTML-5 and SOAP and

java annotations, but not with a new HTTP/HTML version which would make Unicode/UTF-8 the

*default*, and everything else into exceptions ?

That for the sake of interoperability and mutual comprehension, things like HTTP header 
*names* would be restricted to sequences of printable characters in a limited range that 
is available on all human interface devices and universally readable is one thing; but why

would HTTP header *values* or URI path or query-string components (which often have to 
carry real-world multilingual textual information) be similarly limited, and confusing and

inconsistent ?  Why does it still have to be so difficult, in 2014, to create a web 
user-interface application which insures that people from different countries can enter 
their name and place of residence as they know it, and not have the server-side or 
client-side application mangle them ?

If someone were to take the text of RFC 2616 and replace any direct or indirect mention of

US-ASCII and ISO-8859-1 in it, by Unicode/UTF-8, and present this as an RFC for HTTP 2.0,

would the Internet instantly crumble ?
Hoe does one go about doing this ?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message