tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: cookie issue with Tomcat 7 - does not accept the character "é"
Date Mon, 03 Feb 2014 11:56:13 GMT
Chris,

a note :

Christopher Schultz wrote:
...


> 
> Without quoting, unquoted Cookie names and values may be any US-ASCII
> character from 0x32 - 0x7e except for any of ("(" | ")" | "<" | ">" |
> "@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{"
> | "}" | SP | HT). None of the characters above are within that range,
> so the cookie value must be quoted. (It looks to me like Cookie names
> must always be in US-ASCII... I didn't think that was the case but I'm
> not motivated to track-down every word of the spec looking for
> justification).
> 
> What is the character encoding of the request? What client are you
> using? Who created the cookie in the first place?
> 

I did the tracking down of the (tortuous) specs, and come to this :

1) the ISO-8859-1 character set includes "é" (Catalan and other languages) (*)

2) the US-ASCII character set is a subset of ISO-8859-1, and does not include "é".

3) The default character set for HTTP 1.1 is ISO-8859-1, as stated explicitly and 
implicitly in various places in RFC 2616 [1].

However, RFC 2616 does not define the "Cookie" nor "Set-Cookie" headers, and it also does

not specifically indicate which character set should be used for HTTP Request/Response 
header values. It refers for that to MIME (RFC 822), which talks only about US-ASCII.

2) The "Cookie" and "Set-Cookie" headers seem to be subsequently and lastly defined in RFC

6265 [2].
In section 4.1.1 [3], the syntax of these headers is defined, as :

  cookie-pair       = cookie-name "=" cookie-value
  cookie-name       = token
  cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
  cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                        ; US-ASCII characters excluding CTLs,
                        ; whitespace DQUOTE, comma, semicolon,
                        ; and backslash
  token             = <token, defined in [RFC2616], Section 2.2>

Thus, it seems that you are right, and that a cookie *value* can (regrettably still) only

consist of US-ASCII characters (not including "é" thus).

(I cannot find in the specs a way to quote a non-US-ASCII character either; that's 
apparently only allowed in parts defined as "comments")

(It is stated somewhere else in RFC 6265 that it is recommended to encode the Cookie value

via e.g. Base64, if it were to potentially contain non US-ASCII characters).

The cookie name is a "token", and the definition of "token" sends us back to RFC2616.
In "2.2 Basic Rules", RFC2616 states :

    token          = 1*<any CHAR except CTLs or separators>
        separators     = "(" | ")" | "<" | ">" | "@"
                       | "," | ";" | ":" | "\" | <">
                       | "/" | "[" | "]" | "?" | "="
                       | "{" | "}" | SP | HT
...
       CHAR           = <any US-ASCII character (octets 0 - 127)>
       CTL            = <any US-ASCII control character
                         (octets 0 - 31) and DEL (127)>

So, this all would tend to show that you are right, and that Cookie names (as well as 
values) can only consist of US-ASCII characters, and that "é" is thus not allowed (without

some form of encoding that would represent it as a sequence of US-ASCII characters).

Which, in my personal opinion is a lasting p-i-t-a and shame.  And I cannot imagine how it

can be nowadays that nobody has yet gotten around to proposing a HTTP 2.0 RFC where the 
default character set would be Unicode, UTF-8 encoded, for everything excluding maybe 
header names.  But that's neither here nor there.

To get back to the original OP's question thus, it seems to me that
- Tomcat 7.x would not be in violation of the specs, if it indeed rejects a Cookie header

containing any non-US-ASCII character (whether in the cookie name or value).
- That the error message could be improved ("é" is not a control character, it's just 
invalid here)
- but that the real fix for the OP may be to Base64-encode the cookie value before sending

it to the browser.
That's also because it may happen one day that even a browser which respects the specs to

the letter (one never knows), could reject a value like : 
"abcé","abc","abc","abc","abc","abc","abc","abc","abc";


[1] http://tools.ietf.org/search/rfc2616
[2] http://tools.ietf.org/search/rfc6265
[3] http://tools.ietf.org/search/rfc6265#section-4.1.1


(*) The question of whether Catalan is a language, or merely a northern dialect of 
Valenciano, is left to the reader's appreciation ( ;-) ).

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message