tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: cookie issue with Tomcat 7 - does not accept the character "é"
Date Mon, 03 Feb 2014 23:19:22 GMT
Konstantin Kolinko wrote:
> 2014-02-04 André Warnier <aw@ice-sa.com>:
>> Konstantin Kolinko wrote:
>>> 2014-02-03 André Warnier <aw@ice-sa.com>:
>>>> André Warnier wrote:
>>>>> Chris,
>>>>>
>>>>> a note :
>>>>>
>>>>> Christopher Schultz wrote:
>>>>> ...
>>>>>
>>>>>
>>>>>> Without quoting, unquoted Cookie names and values may be any US-ASCII
>>>>>> character from 0x32 - 0x7e except for any of ("(" | ")" | "<"
| ">" |
>>>>>> "@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" |
"=" | "{"
>>>>>> | "}" | SP | HT). None of the characters above are within that range,
>>>>>> so the cookie value must be quoted. (It looks to me like Cookie names
>>>>>> must always be in US-ASCII... I didn't think that was the case but
I'm
>>>>>> not motivated to track-down every word of the spec looking for
>>>>>> justification).
>>>>>>
>>>>>> What is the character encoding of the request? What client are you
>>>>>> using? Who created the cookie in the first place?
>>>>>>
>>>>> I did the tracking down of the (tortuous) specs, and come to this :
>>>>>
>>>>> 1) the ISO-8859-1 character set includes "é" (Catalan and other
>>>>> languages)
>>>>> (*)
>>>>>
>>>>> 2) the US-ASCII character set is a subset of ISO-8859-1, and does not
>>>>> include "é".
>>>>>
>>>>> 3) The default character set for HTTP 1.1 is ISO-8859-1, as stated
>>>>> explicitly and implicitly in various places in RFC 2616 [1].
>>>>>
>>>>> However, RFC 2616 does not define the "Cookie" nor "Set-Cookie" headers,
>>>>> and it also does not specifically indicate which character set should
be
>>>>> used for HTTP Request/Response header values. It refers for that to MIME
>>>>> (RFC 822), which talks only about US-ASCII.
>>>>>
>>>>> 2) The "Cookie" and "Set-Cookie" headers seem to be subsequently and
>>>>> lastly defined in RFC 6265 [2].
>>>>> In section 4.1.1 [3], the syntax of these headers is defined, as :
>>>>>
>>>>>  cookie-pair       = cookie-name "=" cookie-value
>>>>>  cookie-name       = token
>>>>>  cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
>>>>>  cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
>>>>>                        ; US-ASCII characters excluding CTLs,
>>>>>                        ; whitespace DQUOTE, comma, semicolon,
>>>>>                        ; and backslash
>>>>>  token             = <token, defined in [RFC2616], Section 2.2>
>>>>>
>>>>> Thus, it seems that you are right, and that a cookie *value* can
>>>>> (regrettably still) only consist of US-ASCII characters (not including
>>>>> "é"
>>>>> thus).
>>>>>
>>>>> (I cannot find in the specs a way to quote a non-US-ASCII character
>>>>> either; that's apparently only allowed in parts defined as "comments")
>>>>>
>>>>> (It is stated somewhere else in RFC 6265 that it is recommended to
>>>>> encode
>>>>> the Cookie value via e.g. Base64, if it were to potentially contain non
>>>>> US-ASCII characters).
>>>>>
>>>>> The cookie name is a "token", and the definition of "token" sends us
>>>>> back
>>>>> to RFC2616.
>>>>> In "2.2 Basic Rules", RFC2616 states :
>>>>>
>>>>>    token          = 1*<any CHAR except CTLs or separators>
>>>>>        separators     = "(" | ")" | "<" | ">" | "@"
>>>>>                       | "," | ";" | ":" | "\" | <">
>>>>>                       | "/" | "[" | "]" | "?" | "="
>>>>>                       | "{" | "}" | SP | HT
>>>>> ...
>>>>>       CHAR           = <any US-ASCII character (octets 0 - 127)>
>>>>>       CTL            = <any US-ASCII control character
>>>>>                         (octets 0 - 31) and DEL (127)>
>>>>>
>>>>> So, this all would tend to show that you are right, and that Cookie
>>>>> names
>>>>> (as well as values) can only consist of US-ASCII characters, and that
>>>>> "é" is
>>>>> thus not allowed (without some form of encoding that would represent
it
>>>>> as a
>>>>> sequence of US-ASCII characters).
>>>>>
>>>>> Which, in my personal opinion is a lasting p-i-t-a and shame.  And I
>>>>> cannot imagine how it can be nowadays that nobody has yet gotten around
>>>>> to
>>>>> proposing a HTTP 2.0 RFC where the default character set would be
>>>>> Unicode,
>>>>> UTF-8 encoded, for everything excluding maybe header names.  But that's
>>>>> neither here nor there.
>>>>>
>>>>> To get back to the original OP's question thus, it seems to me that
>>>>> - Tomcat 7.x would not be in violation of the specs, if it indeed
>>>>> rejects
>>>>> a Cookie header containing any non-US-ASCII character (whether in the
>>>>> cookie
>>>>> name or value).
>>>>> - That the error message could be improved ("é" is not a control
>>>>> character, it's just invalid here)
>>>>> - but that the real fix for the OP may be to Base64-encode the cookie
>>>>> value before sending it to the browser.
>>>>> That's also because it may happen one day that even a browser which
>>>>> respects the specs to the letter (one never knows), could reject a value
>>>>> like : "abcé","abc","abc","abc","abc","abc","abc","abc","abc";
>>>>>
>>>>>
>>>>> [1] http://tools.ietf.org/search/rfc2616
>>>>> [2] http://tools.ietf.org/search/rfc6265
>>>>> [3] http://tools.ietf.org/search/rfc6265#section-4.1.1
>>>>>
>>>>>
>>>> As an appendix, and triggered by another post to this list, here is
>>>> another
>>>> way of encoding HTTP header values :
>>>>
>>>> Cookie: ACE_COOKIE=R660302447; TD3World=R760446058
>>>> SM_TRANSACTIONID:
>>>> =?UTF-8?B?MGE2NDA2MDEtNDAzMy01MjdjYzlkMy0wMDBhLTJjMWI0NjJi?=
>>>> SM_AUTHTYPE: =?UTF-8?B?QXV0bw==?=
>>>> SM_SDOMAIN: =?UTF-8?B?LnRveW90YS1ldXJvcGUuY29t?=
>>>>
>>>> In this case, the cookie values are encoded using a "MIME extension"
>>>> scheme
>>>> which indicates (between =? ? ?) prior to a string's value, the character
>>>> set/encoding in which the subsequent string is to be interpreted.
>>>> This is not explicitly mentioned in any of the above references, but as I
>>>> recall, this is part of another series of RFC's, maybe starting at this
>>>> one
>>>> :
>>>> http://tools.ietf.org/html/rfc2184
>>>> Now how this works out (also browser-side) with Cookie headers composed
>>>> of
>>>> cookie names and values, I couldn't say.
>>>>
>>> RFC 2616
>>> also says the following on page 16:
>>>
>>>    The TEXT rule is only used for descriptive field contents and values
>>>    that are not intended to be interpreted by the message parser. Words
>>>    of *TEXT MAY contain characters from character sets other than ISO-
>>>    8859-1 [22] only when encoded according to the rules of RFC 2047
>>>    [14].
>>>
>>>        TEXT           = <any OCTET except CTLs,
>>>                         but including LWS>
>>>
>>> RFC 2047 is also referenced in Javadoc for HttpServletResponse.setHeader()
>>>
>>> The "B" encoding used in an example above is one of encodings allowed
>>> by RFC2047 ch.4.1.
>>>
>>> http://www.ietf.org/rfc/rfc2047.txt
>>>
>> Yes, but it never says anywhere that a "cookie value" may contain "*TEXT".
>> Explicitly, it only mentions "*cookie-octet".
>>
> 
> I meant the following part (page 32 of RFC 2616) which defines what
> syntax of HTTP headers is, in general.
> 
>        message-header = field-name ":" [ field-value ]
>        field-name     = token
>        field-value    = *( field-content | LWS )
>        field-content  = <the OCTETs making up the field-value
>                         and consisting of either *TEXT or combinations
>                         of token, separators, and quoted-string>
> 
> TEXT is as I quoted above.
> tokens are US-ASCII minus some characters
> quotes-string is TEXT inside of double quotes.
> 
> Thus there are limits on headers syntax in general,
> including "Cookie" and "Set-Cookie" headers.
> 
>> And, what does it all mean browser-side, particularly for Cookies ?
>>
> 
> Browsers have to be compliant. Are they?

Supposedly.  But in the practice, are they ?
If I send from the server a cookie via a Response header like :

Set-Cookie: =?iso-8859-1?B?mycookie=äöüéè

do IE 7+, Firefox, Chrome browsers interpret this correctly, and understand this as a 
cookie named "mycookie" with a value of "äöüéè" ?
If one of them doesn't, then this is not a practical answer to the OP's problem.

(He can complain to the developers of the non-compliant browser, but how much of a chance

does he have to get it fixed soon enough for his problem ?)

> 
>> Most browsers for example have a "show cookies" function, where they will
>> display the cookie name, value, and other attributes separately.
> 
> That is display of their internal database. It has nothing to do with
> what is allowed on the wire.
> 

Agreed, but my point here was to illustrate the "does the browser understand it" question.

To be practical :

In the OP's original question, the server application would like to set a cookie named 
'GetUser_Properties' with a value '"abc","abcé","abc","abc","abc","abc","abc","abc","abc"'

Clearly (I think) according to the specs, this is not valid :

Set-Cookie: GetUser_Properties="abc","abcé","abc","abc","abc","abc","abc","abc","abc"

As I understand from the specs, this might be valid (one line):

Set-Cookie: 
=?iso-8859-1?Q?GetUser_Properties="abc","abcé","abc","abc","abc","abc","abc","abc","abc"

But, does an average browser understand it ?
And if Tomcat 6 / 7 receive a cookie header like (1 line) :

Cookie: 
=?iso-8859-1?Q?GetUser_Properties="abc","abcé","abc","abc","abc","abc","abc","abc","abc"

do they understand it ?

Or, would you have any other recommendation of how the server should set this cookie ?

What I would do (assuming that the browser-side itself doesn't need to use the value of 
the cookie, just re-send it to the server unchanged)
is :

Set-Cookie: GetUser_Properties=BBBB..BB

where "BBBB..BB" is the Base64-encoded value of the iso-8859-1 string 
'"abc","abcé","abc","abc","abc","abc","abc","abc","abc"'

and the server-side application, when receiving the corresponding cookie, should 
Base64-decode the cookie value before parsing it.

And I am quite sure that
1) it matches all the specs
2) any browser and any server would support this.
3) it is not "native" to any webserver to Base64-encode/decode cookie values, but it is 
fairly easy to implement




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message