Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7B291200CD5 for ; Sun, 30 Jul 2017 10:59:36 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 799E116451A; Sun, 30 Jul 2017 08:59:36 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BEC29164518 for ; Sun, 30 Jul 2017 10:59:35 +0200 (CEST) Received: (qmail 6958 invoked by uid 500); 30 Jul 2017 08:59:34 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 6948 invoked by uid 99); 30 Jul 2017 08:59:34 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Jul 2017 08:59:34 +0000 Received: from DESKTOPF1DTQBM (pD9EE11CD.dip0.t-ipconnect.de [217.238.17.205]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 0644C1A02C0 for ; Sun, 30 Jul 2017 08:59:32 +0000 (UTC) From: =?utf-8?Q?Konstantin_Prei=C3=9Fer?= To: "'Tomcat Users List'" References: <000001d307e3$9197ed60$b4c7c820$@apache.org> <89C771D1-12B0-4D8D-9955-FC1099FA3F31@apache.org> In-Reply-To: <89C771D1-12B0-4D8D-9955-FC1099FA3F31@apache.org> Subject: RE: Tomcat 8.5.19 corrupts static text files encoded with UTF-8 Date: Sun, 30 Jul 2017 10:59:29 +0200 Message-ID: <000001d30912$29698980$7c3c9c80$@apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Content-Language: de Thread-Index: AQHD/0+ww1bIlodjZtjdkQoO2m3FiwFZd1+mon9MWRA= archived-at: Sun, 30 Jul 2017 08:59:36 -0000 Hi Mark, > -----Original Message----- > From: Mark Thomas [mailto:markt@apache.org] > Sent: Saturday, July 29, 2017 2:56 PM >=20 >> (...) >>=20 > >Why would Tomcat want to modify static files, instead of just serving > >them as-is? >=20 > Because Tomcat now checks the response encoding and the file encoding > and converts if necessary. >=20 > You probably want to set the fileEncoding init param of the Default = servlet to > UTF-8. Thanks. So I set the following parameter in web.xml: fileEncoding utf-8 The result now is, that Tomcat converts the static file without a BOM = from UTF-8 to ISO-8859-1, which means my JavaScript files included by = the HTML page will still be broken, as the brower expects them to be = UTF-8-encoded ... I honestly don't understand that change. As a web developer, I expect a = web server to serve static files exactly as-is, without trying to = convert the files into another charset and without trying to detect the = charset of the file (unless the server is configured to do so). Bug 49464 [1] mentions that "As per spec the encoding of the page is = asssumed to be iso-8859-1.". Do I understand correctly that this refers = to the following section "3.7.1 Canonicalization and Text Defaults" of = RFC2616? (...)=20 The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. But not that RFC7231 says in "Appendix B. Changes from RFC 2616": The default charset of ISO-8859-1 for text media types has been removed; the default is now whatever the media type definition says. Likewise, special treatment of ISO-8859-1 has been removed from the Accept-Charset header field. (Section 3.1.1.3 and Section 5.3.3) I found a following page that talks about this change [2] and mentions = RFC6657 [3] that describes text/* media registrations with charset = handling. While RFC6657 seems to indicate that the default charset of text/plain = is US-ASCII (which is not what browsers do), it doesn't seem to indicate = a default charset for other types like text/html, text/javascript, = application/javascript etc. Browsers (I tested with IE, Firefox and Chrome) already handle the = encoding of text-based files where the Content-Type doesn't specify a = charset as the user would expect: - For example, with text/html files that don't contain a BOM, they will = respect the element. If a UTF-8 BOM is present, = they will interpret it as UTF-8. - If you directly open text/plain, text/css, application/javascript = files in a browser, they will check if the file has an UTF-8 BOM, and = interpret it as UTF-8 in that case; otherwise, they seem to interpret it = as ISO-8859-1/Windows-1252 (or maybe using the default system encoding, = I'm not exactly sure about that). - However, if such files (.css and .js) are referenced by a HTML file, = browsers will interpret them in the same encoding that the HTML file (if = they don't have a BOM), which means if the HTML uses UTF-8, they will = interpret .js and .css also as UTF-8 (unless the HTML element uses a = charset parameter, e.g. ). Therefore, I don't see why Tomcat would want to convert static resources = to other encodings. (I think it should also not try to detect the = charset of files and then include a "; charset=3D..." parameter in the = Content-Type, as this may override the browser's behavior and thus also = lead to incorrect decoding of JavaScript files that are encoded with = UTF-8 without a BOM). Further, as an system administrator, I would expect that I can update = Tomcat from x.y.z to x.y.(z+n), without static JavaScript files suddenly = getting broken (which isn't immediately obvious as mostly the script per = se will work, only that some special string characters outside of ASCII = are displayed incorrectly to the user). Shouldn't such behavior changes be reserved for the next major/minor = version which is not yet stable, in this case Tomcat 9.0.0? Thanks! Regards, Konstantin Prei=C3=9Fer [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=3D49464 [2] https://github.com/requests/requests/issues/2086 [3] https://tools.ietf.org/html/rfc6657 --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org