From mime4j-dev-return-1777-apmail-james-mime4j-dev-archive=james.apache.org@james.apache.org Mon Dec 12 14:30:44 2011 Return-Path: X-Original-To: apmail-james-mime4j-dev-archive@minotaur.apache.org Delivered-To: apmail-james-mime4j-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7E1E69033 for ; Mon, 12 Dec 2011 14:30:44 +0000 (UTC) Received: (qmail 98182 invoked by uid 500); 12 Dec 2011 14:30:44 -0000 Delivered-To: apmail-james-mime4j-dev-archive@james.apache.org Received: (qmail 98154 invoked by uid 500); 12 Dec 2011 14:30:44 -0000 Mailing-List: contact mime4j-dev-help@james.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mime4j-dev@james.apache.org Delivered-To: mailing list mime4j-dev@james.apache.org Received: (qmail 98146 invoked by uid 99); 12 Dec 2011 14:30:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Dec 2011 14:30:44 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [217.150.250.48] (HELO kalnich.nine.ch) (217.150.250.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Dec 2011 14:30:34 +0000 Received: from [192.168.1.110] (77-57-197-206.dclient.hispeed.ch [77.57.197.206]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by kalnich.nine.ch (Postfix) with ESMTPSA id C6B0BB83007 for ; Mon, 12 Dec 2011 15:30:10 +0100 (CET) Message-ID: <1323700212.3776.20.camel@ubuntu> Subject: Re: Is it possible to have this mail parsed correctly? From: Oleg Kalnichevski To: mime4j-dev@james.apache.org Date: Mon, 12 Dec 2011 15:30:12 +0100 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.1- Content-Transfer-Encoding: 8bit Mime-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org On Mon, 2011-12-12 at 10:52 +0100, Lukáš Vlček wrote: > Hi Stefano, > > Thanks for the analysis. I extracted this use case to the following test: > https://github.com/lukas-vlcek/mime4j-test/blob/master/src/test/java/org/mime4j/test/BasicTest.java#L45 > > Now, the question is, why Mailman is able to render the output correctly if > the charset and used encoding in the body are not in sync. May be the > encoding of the message file has been changed when I copied the file from > the server to my local dev machine... or it is just coincidence? I do not > know... just thinking out loud... > > Regards, > Lukas > This is the hex dump of the message which suggests the message body content is utf-8 coded, while the content-type header declares ISO-8859-1 as the content charset. 00000860 6E 79 6F 6E 65 20 74 68 65 72 65 3F 20 3A 29 0A 0A 2D 2D 0A 47 61 6C 64 65 72 20 5A 61 6D 61 72 nyone there? :)..--.Galder Zamar 00000880 72 65 C3 B1 6F 0A 53 72 2E 20 53 6F 66 74 77 61 72 65 20 4D 61 69 6E 74 65 6E 61 6E 63 65 20 45 re..o.Sr. Software Maintenance E It can be that the message got modified while copied, or it can be that Mailman employs some sort of content type / charset detection mechanism. In any case mime4j correctly decoded the message based on its metadata. Oleg > On Fri, Dec 9, 2011 at 4:35 PM, Stefano Bagnara wrote: > > > 2011/12/7 Lukáš Vlček : > > > Hi, > > > > > > The following is a eml source of a short mail: > > > https://gist.github.com/5a9b383c1dc048fac6d4 > > > > > > The following is a link to public (Mailman) pipermail rendered > > > representation of the same mail: > > > > > http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html > > > > > > Note how the sign in the footer of the email contains name "Zamarreño". > > > > > > When using mime4j I am getting "Zamarreño" instead (tested with both 0.6 > > > and 0.7.1). > > > > > > Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do > > it? > > > > mime4j is doing the right thing. > > The message declares the charset as ISO-8859-1 and then use an UTF8 > > sequence. > > So if you really want to use ñ in an ISO-8859-1 message make sure you > > also use the right bytes (F1 is the right ISO-8859-1 instead "C3 B1" > > is the UTF8 sequence). > > > > The gist is displayed correctly on your browser because your browser > > uses utf8 to show it to you: force it to ISO-8859-1 and you will see > > the same sequence that mime4j gives you. > > > > Stefano > > > > > Regards, > > > Lukas > >