Return-Path: Delivered-To: apmail-tomcat-dev-archive@www.apache.org Received: (qmail 19055 invoked from network); 7 Oct 2008 00:59:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Oct 2008 00:59:08 -0000 Received: (qmail 24807 invoked by uid 500); 7 Oct 2008 00:59:00 -0000 Delivered-To: apmail-tomcat-dev-archive@tomcat.apache.org Received: (qmail 24757 invoked by uid 500); 7 Oct 2008 00:59:00 -0000 Mailing-List: contact dev-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Developers List" Delivered-To: mailing list dev@tomcat.apache.org Received: (qmail 24746 invoked by uid 99); 7 Oct 2008 00:59:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2008 17:59:00 -0700 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=DNS_FROM_SECURITYSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of andrejohn.mas@gmail.com designates 209.85.162.179 as permitted sender) Received: from [209.85.162.179] (HELO el-out-1112.google.com) (209.85.162.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Oct 2008 00:57:57 +0000 Received: by el-out-1112.google.com with SMTP id y26so868276ele.18 for ; Mon, 06 Oct 2008 17:58:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=3ouzudv5AvKaxzWiWDk81m3G5JPwWv9GSv5ph4AgyoY=; b=JjjoGWqcbRwrMZnecr8NlRqfGo30XYgNsqCXJ03L8nRk3/EQOxk8cA2IcD8oZabuRm v85A1bIK8j1i8iLMp7SfMJVfrsYPgnObCMSQ84A9dVOr9dtfrlC2hNiOokUaItYqih7f kPzj4IRzznc0I0Q+5WPrG0U3NG0WCZh2iEFl8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=tjJlESRnHeDStjGLpqhRGW9rq7HGV6WPdbqFj1fXh1ypQy0uiV05mYRNDCDOodnRR8 8jyIyVHlYkg3pjeuHvS7B+l+LmshXSRXXZ7tQUPxKM/p36/ggoOINUkR7NmUiGOSJ+V8 jsY3AQ/rkmlEH64za+TJVctuBKTicfuHRZTPY= Received: by 10.65.234.18 with SMTP id l18mr9698655qbr.1.1223341102066; Mon, 06 Oct 2008 17:58:22 -0700 (PDT) Received: from ?192.168.2.110? (bas7-montreal28-1178024805.dsl.bell.ca [70.55.59.101]) by mx.google.com with ESMTPS id k30sm12246390qba.4.2008.10.06.17.58.20 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 06 Oct 2008 17:58:20 -0700 (PDT) Message-Id: <0DDB2311-4BF5-473A-B399-1AAD8AD36BD7@gmail.com> From: Andre-John Mas To: "Tomcat Developers List" In-Reply-To: <48EA9B3F.4090507@joedog.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v929.2) Subject: Re: UTF-8 POST request results in corrupted data Date: Mon, 6 Oct 2008 20:58:17 -0400 References: <67ca80b70810061233x476bdef9y34ffbf041b6c7878@mail.gmail.com> <48EA9B3F.4090507@joedog.org> X-Mailer: Apple Mail (2.929.2) X-Virus-Checked: Checked by ClamAV on apache.org Just to repeat what I stated in the ticket: The problem I have with the suggested approach is that it treats UTF-8 =20= as an exception, rather that a norm for my whole application server. I am =20 not sure that I should be having to be specifying the encoding before handling =20= every request. For a web site that is completely in UTF-8 that is a lot of =20 duplicated code. Also, I ask the question why should we allow one behaviour for the URI =20= in the container and not allow for the same with regards to the POST? Andr=E9 On 6-Oct-08, at 19:11 , Tim Funk wrote: > Before reading the POST body - you should first be doing this: > request.setCharacterEncoding("UTF-8") > > > -Tim > > Andr=E9-John Mas wrote: >> Hi, >> I have opened issue 45957, for an issue that has bothered me for a =20= >> while: >> https://issues.apache.org/bugzilla/show_bug.cgi?id=3D45957 >> To resume: >> Currently in Tomcat 5 if a request is received containing UTF-8 >> content then any accents or non-Roman characters are corrupted, since >> there is an assumption >> the POST request is ISO-8895-1 (latin1). For example '=E9' becomes =20= >> '=C3(c)' >> Has anyone looked into this as part of a separate task, otherwise I >> would be willing to see what could be done. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org > For additional commands, e-mail: dev-help@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org For additional commands, e-mail: dev-help@tomcat.apache.org