tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: cookie issue with Tomcat 7 - does not accept the character "é"
Date Mon, 03 Feb 2014 12:19:49 GMT
André Warnier wrote:
> Chris,
> 
> a note :
> 
> Christopher Schultz wrote:
> ...
> 
> 
>>
>> Without quoting, unquoted Cookie names and values may be any US-ASCII
>> character from 0x32 - 0x7e except for any of ("(" | ")" | "<" | ">" |
>> "@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{"
>> | "}" | SP | HT). None of the characters above are within that range,
>> so the cookie value must be quoted. (It looks to me like Cookie names
>> must always be in US-ASCII... I didn't think that was the case but I'm
>> not motivated to track-down every word of the spec looking for
>> justification).
>>
>> What is the character encoding of the request? What client are you
>> using? Who created the cookie in the first place?
>>
> 
> I did the tracking down of the (tortuous) specs, and come to this :
> 
> 1) the ISO-8859-1 character set includes "é" (Catalan and other 
> languages) (*)
> 
> 2) the US-ASCII character set is a subset of ISO-8859-1, and does not 
> include "é".
> 
> 3) The default character set for HTTP 1.1 is ISO-8859-1, as stated 
> explicitly and implicitly in various places in RFC 2616 [1].
> 
> However, RFC 2616 does not define the "Cookie" nor "Set-Cookie" headers, 
> and it also does not specifically indicate which character set should be 
> used for HTTP Request/Response header values. It refers for that to MIME 
> (RFC 822), which talks only about US-ASCII.
> 
> 2) The "Cookie" and "Set-Cookie" headers seem to be subsequently and 
> lastly defined in RFC 6265 [2].
> In section 4.1.1 [3], the syntax of these headers is defined, as :
> 
>  cookie-pair       = cookie-name "=" cookie-value
>  cookie-name       = token
>  cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
>  cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
>                        ; US-ASCII characters excluding CTLs,
>                        ; whitespace DQUOTE, comma, semicolon,
>                        ; and backslash
>  token             = <token, defined in [RFC2616], Section 2.2>
> 
> Thus, it seems that you are right, and that a cookie *value* can 
> (regrettably still) only consist of US-ASCII characters (not including 
> "é" thus).
> 
> (I cannot find in the specs a way to quote a non-US-ASCII character 
> either; that's apparently only allowed in parts defined as "comments")
> 
> (It is stated somewhere else in RFC 6265 that it is recommended to 
> encode the Cookie value via e.g. Base64, if it were to potentially 
> contain non US-ASCII characters).
> 
> The cookie name is a "token", and the definition of "token" sends us 
> back to RFC2616.
> In "2.2 Basic Rules", RFC2616 states :
> 
>    token          = 1*<any CHAR except CTLs or separators>
>        separators     = "(" | ")" | "<" | ">" | "@"
>                       | "," | ";" | ":" | "\" | <">
>                       | "/" | "[" | "]" | "?" | "="
>                       | "{" | "}" | SP | HT
> ...
>       CHAR           = <any US-ASCII character (octets 0 - 127)>
>       CTL            = <any US-ASCII control character
>                         (octets 0 - 31) and DEL (127)>
> 
> So, this all would tend to show that you are right, and that Cookie 
> names (as well as values) can only consist of US-ASCII characters, and 
> that "é" is thus not allowed (without some form of encoding that would 
> represent it as a sequence of US-ASCII characters).
> 
> Which, in my personal opinion is a lasting p-i-t-a and shame.  And I 
> cannot imagine how it can be nowadays that nobody has yet gotten around 
> to proposing a HTTP 2.0 RFC where the default character set would be 
> Unicode, UTF-8 encoded, for everything excluding maybe header names.  
> But that's neither here nor there.
> 
> To get back to the original OP's question thus, it seems to me that
> - Tomcat 7.x would not be in violation of the specs, if it indeed 
> rejects a Cookie header containing any non-US-ASCII character (whether 
> in the cookie name or value).
> - That the error message could be improved ("é" is not a control 
> character, it's just invalid here)
> - but that the real fix for the OP may be to Base64-encode the cookie 
> value before sending it to the browser.
> That's also because it may happen one day that even a browser which 
> respects the specs to the letter (one never knows), could reject a value 
> like : "abcé","abc","abc","abc","abc","abc","abc","abc","abc";
> 
> 
> [1] http://tools.ietf.org/search/rfc2616
> [2] http://tools.ietf.org/search/rfc6265
> [3] http://tools.ietf.org/search/rfc6265#section-4.1.1
> 
> 

As an appendix, and triggered by another post to this list, here is another way of 
encoding HTTP header values :

Cookie: ACE_COOKIE=R660302447; TD3World=R760446058
SM_TRANSACTIONID:
=?UTF-8?B?MGE2NDA2MDEtNDAzMy01MjdjYzlkMy0wMDBhLTJjMWI0NjJi?=
SM_AUTHTYPE: =?UTF-8?B?QXV0bw==?=
SM_SDOMAIN: =?UTF-8?B?LnRveW90YS1ldXJvcGUuY29t?=

In this case, the cookie values are encoded using a "MIME extension" scheme which 
indicates (between =? ? ?) prior to a string's value, the character set/encoding in which

the subsequent string is to be interpreted.
This is not explicitly mentioned in any of the above references, but as I recall, this is

part of another series of RFC's, maybe starting at this one :
http://tools.ietf.org/html/rfc2184
Now how this works out (also browser-side) with Cookie headers composed of cookie names 
and values, I couldn't say.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message