james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Kalnichevski <ol...@apache.org>
Subject Re: Switching from 0.6 to 0.7.1 - address parsing, CRLF issues
Date Sun, 18 Dec 2011 12:22:02 GMT
On Sun, 2011-12-18 at 12:57 +0100, Stefano Bagnara wrote:
> 2011/12/18 Lukáš Vlček <lukas.vlcek@gmail.com>:
> > Hi again,
> >
> > there is another difference between 0.6 and 0.7.1
> >
> > It happens when I need to parse message with weird charset 'x-gbk'.
> >
> > I added a new test case into my github repo for both 0.6 and 0.7.1 versions.
> > Basically the problem happens when I try to get Reader from the TextBody:
> >
> > Message message = getMessage(....);
> > TextBody body = (TextBody) message.getBody();
> > body.getReader(); // here the exception is fired
> >
> > It worked fine in 0.6 but it yields UnsupportedEncodingException in 0.7.1
> > which is quite unfortunate because I do not see way how to extract body
> > content from it in this case (as I need to get the Reader first).
> >
> > Here are examples:
> > 0.6
> > https://github.com/lukas-vlcek/mime4j-test/blob/backto06/src/test/java/org/mime4j/test/BasicTest.java#L134
> > 0.7.1
> > https://github.com/lukas-vlcek/mime4j-test/blob/workaround/src/test/java/org/mime4j/test/BasicTest.java#L134
> >
> > Any idea hot to deal with this situation?
> >
> > Note, I understand that 'x-gbk' is unknown charset for JVM but even in such
> > case I was able to extract the body content by forcing 'gbk' charset
> > instead into my utility classes in 0.6, however, with 0.7.1 I am unable to
> > do it because the very basic body.getReader call fails... is there any
> > workaround how to get the body Reader and not to get the exception?
> 
> Please open a JIRA issue, attach your test case and the stack trace of
> the exception you get.
> I guess we can classify this as a bug because we want to give users a
> way to deal with malformed stuff, so also unrecognized charsets.
> 
> I don't know if we already have an existing workaround but the JIRA
> issue will anyway help keeping track of the issue for future users.
> 
> Stefano


There's no need to raise a JIRA, because basically there is no issue.
#getReader is merely a convenience method. One can always use
#getInputStream() to get raw data stream and apply whatever charset
encoding.

Lukáš, what is wrong with that?

---
Message message =
ParserUtil.getMessage(getInputStream("mbox/jboss-as7-dev-01.mbox"));

assertTrue(message.getBody() instanceof TextBody);
TextBody body = (TextBody) message.getBody();

String charset = message.getCharset();
if (charset.equalsIgnoreCase("x-gbk")) {
    charset = "gbk";
}
Reader reader = new InputStreamReader(body.getInputStream(), charset);
---



Mime
View raw message