Return-Path: X-Original-To: apmail-tomcat-users-archive@www.apache.org Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 91F5510EFB for ; Mon, 3 Feb 2014 21:58:28 +0000 (UTC) Received: (qmail 71753 invoked by uid 500); 3 Feb 2014 21:58:24 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 71692 invoked by uid 500); 3 Feb 2014 21:58:23 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 71683 invoked by uid 99); 3 Feb 2014 21:58:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Feb 2014 21:58:23 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of knst.kolinko@gmail.com designates 209.85.216.170 as permitted sender) Received: from [209.85.216.170] (HELO mail-qc0-f170.google.com) (209.85.216.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Feb 2014 21:58:19 +0000 Received: by mail-qc0-f170.google.com with SMTP id e9so12540256qcy.29 for ; Mon, 03 Feb 2014 13:57:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=s65AR4c7lv6/3Gd99md/cDdlN3E/1MSLSCi20mtgnXY=; b=K49B8hZxCL2X77k/Vj5HT/qOqJRfB3cp4N46NysuwczfgV7Nbm27pn5QC6nWQJP/pn lMU7S4QeAarlWGlv5dN8SysGcQfc9cOZDfaPqCAVnOGQ5CAUDkQwPXrunWRaw/vQQxdL ndMSXNA9t1hGc/12t7UPhPG+3q0gBt07vSQ2ICMxOV9+OCIYE4gb/l7sXJXNJDRYecsv tk4C7iPKNVZdIK0+3VOBN8vgMb68yHKoqQgxCv+As8XJjwJVRfk7nDo0RuwYoOZkb4az YwI+BZRHBmObRohEMmuuxdsxu7e4fAn2pA6SWlqlc8wk1N4mo5S4R8lETjABxOTuoCu9 PkVw== MIME-Version: 1.0 X-Received: by 10.224.72.72 with SMTP id l8mr60756113qaj.51.1391464678365; Mon, 03 Feb 2014 13:57:58 -0800 (PST) Received: by 10.140.94.105 with HTTP; Mon, 3 Feb 2014 13:57:58 -0800 (PST) In-Reply-To: <52F00C3F.9030308@ice-sa.com> References: <0B2BAA6E1481AA40863BF46BCBF86A700143FCDA@DEFDCE0026.nestle.com> <52EEA3C9.7000902@christopherschultz.net> <52EF83DD.1070101@ice-sa.com> <52EF8965.7020508@ice-sa.com> <52F00C3F.9030308@ice-sa.com> Date: Tue, 4 Feb 2014 01:57:58 +0400 Message-ID: Subject: =?ISO-8859-1?Q?Re=3A_cookie_issue_with_Tomcat_7_=2D_does_not_accept_the?= =?ISO-8859-1?Q?_character_=22=E9=22?= From: Konstantin Kolinko To: Tomcat Users List Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org 2014-02-04 Andr=E9 Warnier : > Konstantin Kolinko wrote: >> >> 2014-02-03 Andr=E9 Warnier : >>> >>> Andr=E9 Warnier wrote: >>>> >>>> Chris, >>>> >>>> a note : >>>> >>>> Christopher Schultz wrote: >>>> ... >>>> >>>> >>>>> Without quoting, unquoted Cookie names and values may be any US-ASCII >>>>> character from 0x32 - 0x7e except for any of ("(" | ")" | "<" | ">" | >>>>> "@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" | "=3D" | "= {" >>>>> | "}" | SP | HT). None of the characters above are within that range, >>>>> so the cookie value must be quoted. (It looks to me like Cookie names >>>>> must always be in US-ASCII... I didn't think that was the case but I'= m >>>>> not motivated to track-down every word of the spec looking for >>>>> justification). >>>>> >>>>> What is the character encoding of the request? What client are you >>>>> using? Who created the cookie in the first place? >>>>> >>>> I did the tracking down of the (tortuous) specs, and come to this : >>>> >>>> 1) the ISO-8859-1 character set includes "=E9" (Catalan and other >>>> languages) >>>> (*) >>>> >>>> 2) the US-ASCII character set is a subset of ISO-8859-1, and does not >>>> include "=E9". >>>> >>>> 3) The default character set for HTTP 1.1 is ISO-8859-1, as stated >>>> explicitly and implicitly in various places in RFC 2616 [1]. >>>> >>>> However, RFC 2616 does not define the "Cookie" nor "Set-Cookie" header= s, >>>> and it also does not specifically indicate which character set should = be >>>> used for HTTP Request/Response header values. It refers for that to MI= ME >>>> (RFC 822), which talks only about US-ASCII. >>>> >>>> 2) The "Cookie" and "Set-Cookie" headers seem to be subsequently and >>>> lastly defined in RFC 6265 [2]. >>>> In section 4.1.1 [3], the syntax of these headers is defined, as : >>>> >>>> cookie-pair =3D cookie-name "=3D" cookie-value >>>> cookie-name =3D token >>>> cookie-value =3D *cookie-octet / ( DQUOTE *cookie-octet DQUOTE ) >>>> cookie-octet =3D %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E >>>> ; US-ASCII characters excluding CTLs, >>>> ; whitespace DQUOTE, comma, semicolon, >>>> ; and backslash >>>> token =3D >>>> >>>> Thus, it seems that you are right, and that a cookie *value* can >>>> (regrettably still) only consist of US-ASCII characters (not including >>>> "=E9" >>>> thus). >>>> >>>> (I cannot find in the specs a way to quote a non-US-ASCII character >>>> either; that's apparently only allowed in parts defined as "comments") >>>> >>>> (It is stated somewhere else in RFC 6265 that it is recommended to >>>> encode >>>> the Cookie value via e.g. Base64, if it were to potentially contain no= n >>>> US-ASCII characters). >>>> >>>> The cookie name is a "token", and the definition of "token" sends us >>>> back >>>> to RFC2616. >>>> In "2.2 Basic Rules", RFC2616 states : >>>> >>>> token =3D 1* >>>> separators =3D "(" | ")" | "<" | ">" | "@" >>>> | "," | ";" | ":" | "\" | <"> >>>> | "/" | "[" | "]" | "?" | "=3D" >>>> | "{" | "}" | SP | HT >>>> ... >>>> CHAR =3D >>>> CTL =3D >>> (octets 0 - 31) and DEL (127)> >>>> >>>> So, this all would tend to show that you are right, and that Cookie >>>> names >>>> (as well as values) can only consist of US-ASCII characters, and that >>>> "=E9" is >>>> thus not allowed (without some form of encoding that would represent i= t >>>> as a >>>> sequence of US-ASCII characters). >>>> >>>> Which, in my personal opinion is a lasting p-i-t-a and shame. And I >>>> cannot imagine how it can be nowadays that nobody has yet gotten aroun= d >>>> to >>>> proposing a HTTP 2.0 RFC where the default character set would be >>>> Unicode, >>>> UTF-8 encoded, for everything excluding maybe header names. But that'= s >>>> neither here nor there. >>>> >>>> To get back to the original OP's question thus, it seems to me that >>>> - Tomcat 7.x would not be in violation of the specs, if it indeed >>>> rejects >>>> a Cookie header containing any non-US-ASCII character (whether in the >>>> cookie >>>> name or value). >>>> - That the error message could be improved ("=E9" is not a control >>>> character, it's just invalid here) >>>> - but that the real fix for the OP may be to Base64-encode the cookie >>>> value before sending it to the browser. >>>> That's also because it may happen one day that even a browser which >>>> respects the specs to the letter (one never knows), could reject a val= ue >>>> like : "abc=E9","abc","abc","abc","abc","abc","abc","abc","abc"; >>>> >>>> >>>> [1] http://tools.ietf.org/search/rfc2616 >>>> [2] http://tools.ietf.org/search/rfc6265 >>>> [3] http://tools.ietf.org/search/rfc6265#section-4.1.1 >>>> >>>> >>> As an appendix, and triggered by another post to this list, here is >>> another >>> way of encoding HTTP header values : >>> >>> Cookie: ACE_COOKIE=3DR660302447; TD3World=3DR760446058 >>> SM_TRANSACTIONID: >>> =3D?UTF-8?B?MGE2NDA2MDEtNDAzMy01MjdjYzlkMy0wMDBhLTJjMWI0NjJi?=3D >>> SM_AUTHTYPE: =3D?UTF-8?B?QXV0bw=3D=3D?=3D >>> SM_SDOMAIN: =3D?UTF-8?B?LnRveW90YS1ldXJvcGUuY29t?=3D >>> >>> In this case, the cookie values are encoded using a "MIME extension" >>> scheme >>> which indicates (between =3D? ? ?) prior to a string's value, the chara= cter >>> set/encoding in which the subsequent string is to be interpreted. >>> This is not explicitly mentioned in any of the above references, but as= I >>> recall, this is part of another series of RFC's, maybe starting at this >>> one >>> : >>> http://tools.ietf.org/html/rfc2184 >>> Now how this works out (also browser-side) with Cookie headers composed >>> of >>> cookie names and values, I couldn't say. >>> >> >> RFC 2616 >> also says the following on page 16: >> >> The TEXT rule is only used for descriptive field contents and values >> that are not intended to be interpreted by the message parser. Words >> of *TEXT MAY contain characters from character sets other than ISO- >> 8859-1 [22] only when encoded according to the rules of RFC 2047 >> [14]. >> >> TEXT =3D > but including LWS> >> >> RFC 2047 is also referenced in Javadoc for HttpServletResponse.setHeader= () >> >> The "B" encoding used in an example above is one of encodings allowed >> by RFC2047 ch.4.1. >> >> http://www.ietf.org/rfc/rfc2047.txt >> > > Yes, but it never says anywhere that a "cookie value" may contain "*TEXT"= . > Explicitly, it only mentions "*cookie-octet". > I meant the following part (page 32 of RFC 2616) which defines what syntax of HTTP headers is, in general. message-header =3D field-name ":" [ field-value ] field-name =3D token field-value =3D *( field-content | LWS ) field-content =3D TEXT is as I quoted above. tokens are US-ASCII minus some characters quotes-string is TEXT inside of double quotes. Thus there are limits on headers syntax in general, including "Cookie" and "Set-Cookie" headers. > And, what does it all mean browser-side, particularly for Cookies ? > Browsers have to be compliant. Are they? > Most browsers for example have a "show cookies" function, where they will > display the cookie name, value, and other attributes separately. That is display of their internal database. It has nothing to do with what is allowed on the wire. Best regards, Konstantin Kolinko --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org