james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: Is it possible to have this mail parsed correctly?
Date Mon, 12 Dec 2011 14:30:12 GMT
On Mon, 2011-12-12 at 10:52 +0100, Lukáš Vlček wrote:
> Hi Stefano,
> 
> Thanks for the analysis. I extracted this use case to the following test:
> https://github.com/lukas-vlcek/mime4j-test/blob/master/src/test/java/org/mime4j/test/BasicTest.java#L45
> 
> Now, the question is, why Mailman is able to render the output correctly if
> the charset and used encoding in the body are not in sync. May be the
> encoding of the message file has been changed when I copied the file from
> the server to my local dev machine... or it is just coincidence? I do not
> know... just thinking out loud...
> 
> Regards,
> Lukas
> 

This is the hex dump of the message which suggests the message body
content is utf-8 coded, while the content-type header declares
ISO-8859-1 as the content charset. 

00000860   6E 79 6F 6E  65 20 74 68  65 72 65 3F  20 3A 29 0A  0A 2D 2D
0A  47 61 6C 64  65 72 20 5A  61 6D 61 72  nyone there? :)..--.Galder
Zamar
00000880   72 65 C3 B1  6F 0A 53 72  2E 20 53 6F  66 74 77 61  72 65 20
4D  61 69 6E 74  65 6E 61 6E  63 65 20 45  re..o.Sr. Software
Maintenance E

It can be that the message got modified while copied, or it can be that
Mailman employs some sort of content type / charset detection mechanism.
In any case mime4j correctly decoded the message based on its metadata.

Oleg 


> On Fri, Dec 9, 2011 at 4:35 PM, Stefano Bagnara <apache@bago.org> wrote:
> 
> > 2011/12/7 Lukáš Vlček <lukas.vlcek@gmail.com>:
> > > Hi,
> > >
> > > The following is a eml source of a short mail:
> > > https://gist.github.com/5a9b383c1dc048fac6d4
> > >
> > > The following is a link to public (Mailman) pipermail rendered
> > > representation of the same mail:
> > >
> > http://lists.jboss.org/pipermail/jboss-cluster-dev/2008-April/000000.html
> > >
> > > Note how the sign in the footer of the email contains name "Zamarreño".
> > >
> > > When using mime4j I am getting "Zamarreño" instead (tested with both 0.6
> > > and 0.7.1).
> > >
> > > Is mime4j able to parse this mail the same way as Mailman (2.1.9) can do
> > it?
> >
> > mime4j is doing the right thing.
> > The message declares the charset as ISO-8859-1 and then use an UTF8
> > sequence.
> > So if you really want to use ñ in an ISO-8859-1 message make sure you
> > also use the right bytes (F1 is the right ISO-8859-1 instead "C3 B1"
> > is the UTF8 sequence).
> >
> > The gist is displayed correctly on your browser because your browser
> > uses utf8 to show it to you: force it to ISO-8859-1 and you will see
> > the same sequence that mime4j gives you.
> >
> > Stefano
> >
> > > Regards,
> > > Lukas
> >



Mime
View raw message