james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Huttar (Updated) (JIRA)" <mime4j-...@james.apache.org>
Subject [jira] [Updated] (MIME4J-214) Writing Multipart Mails with DefaultMessageWriter can produce unreadable mails
Date Fri, 16 Mar 2012 09:07:40 GMT

     [ https://issues.apache.org/jira/browse/MIME4J-214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthias Huttar updated MIME4J-214:
-----------------------------------

    Description: 
I've figured out, that the DefaultMessageWriter's content-transfer-encoding is a bit too naive
on handling multipart mails: If the mail headers define a content-transfer encoding, the complete
Multipart entity is encoded using content-transfer-encoding. This, tough, may lead to unparseable
mails: 
- Mail headers in parts of the multipart will get encoded. e.g. {{Content-Type: text/plain;
charset=3Diso-8859-1}}
- Multipart boundaries will get escaped. e.g. with quoted-printable and Content-Type: {{multipart/mixed;
boundary="=SIMPLEBOUNDARY="}}, will get encoded to =3DSIMPLEBOUNDARY=3D. Which then cannot
be parsed by major mail clients (Outlook, Thunderbird, some web mail clients, as we've heard
from some users). Mime4J BTW also is not able to parse that generated mail correctly.  
- We've heard from various sorts of weird issues with mails from our users, relating this
bug. (e.g. attachments get lost, charset is broken, mails are empty, entire mail body is viewed
as plaintext (showing the HTML source),...). However, I could not quite reproduce all of those
things with a unit test. 

Below you can find a unit test exposing the issue and a solution proposal. Unfortunately,
this proposal does not implement the MessageWriter interface, as doing the content-transfer-encoding
right, required a serious amount of state to be transferred. I could not get this to work
right retaining the interface. 

Unit test and bugfix for message writer is attached. 

  was:
I've figured out, that the DefaultMessageWriter's content-transfer-encoding is a bit too naive
on handling multipart mails: If the mail headers define a content-transfer encoding, the complete
Multipart entity is encoded using content-transfer-encoding. This, tough, may lead to unparseable
mails: 
- Mail headers in parts of the multipart will get encoded. e.g. {{Content-Type: text/plain;
charset=3Diso-8859-1}}
- Multipart boundaries will get escaped. e.g. with quoted-printable and Content-Type: {{multipart/mixed;
boundary="=SIMPLEBOUNDARY="}}, will get encoded to =3DSIMPLEBOUNDARY=3D. Which then cannot
be parsed by major mail clients (Outlook, Thunderbird, some web mail clients, as we've heard
from some users). Mime4J BTW also is not able to parse that generated mail correctly.  
- We've heard from various sorts of weird issues with mails from our users, relating this
bug. (e.g. attachments get lost, charset is broken, mails are empty, entire mail body is viewed
as plaintext (showing the HTML source),...). However, I could not quite reproduce all of those
things with a unit test. 

Below you can find a unit test exposing the issue and a solution proposal. Unfortunately,
this proposal does not implement the MessageWriter interface, as doing the content-transfer-encoding
right, required a serious amount of state to be transferred. I could not get this to work
right retaining the interface. 

Unit test (please also put the fix below to your classpath to have it work):
{code}
package com.ecg.replyts.test.mailparser.mime4j;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;

import org.apache.james.mime4j.dom.Message;
import org.apache.james.mime4j.dom.Multipart;
import org.apache.james.mime4j.message.DefaultMessageBuilder;
import org.apache.james.mime4j.message.DefaultMessageWriter;
import org.junit.Before;
import org.junit.Test;

import com.ecg.replyts.mailparser.mime4j.MimeMessageWriter;

public class ContentTransferEncodingMime4JExposer {

    static final String INPUT_MAIL = "Return-Path: <hello1@you.com>\n" +
            "From: hello1@you.com\n" +
            "To: to1@you.com\n" +
            "Date: Mon, 27 Feb 2012 18:09:14 +0100\n" +
            "Content-Type: multipart/mixed;\n" +
            " boundary=\"=_cbaa2da1a4419c7f8c5c8f9b16aa3ebe\"\n" +
            "Content-Transfer-Encoding: quoted-printable\n" +
            "Content-Disposition: inline\n" +
            "MIME-Version: 1.0\n" +
            "\n" +
            "This is a message in Mime Format." +
            "\n" +
            "--=_cbaa2da1a4419c7f8c5c8f9b16aa3ebe\n" +
            "Content-Type: text/html; charset=iso-8859-1\n" +
            "Content-Transfer-Encoding: quoted-printable\n" +
            "Content-Disposition: inline\n" +
            "\n" +
            "<h1>Hello World</h1>\n" +
            "\n" +
            "--=_cbaa2da1a4419c7f8c5c8f9b16aa3ebe\n" +
            "Content-Type: text/plain; charset=iso-8859-1\n" +
            "Content-Transfer-Encoding: quoted-printable\n" +
            "Content-Disposition: inline\n" +
            "\n" +
            "If you believe that truth=3Dbeauty, then surely=20=\n" +
            "mathematics is the most beautiful branch of philosophy.\n" +
            "--=_cbaa2da1a4419c7f8c5c8f9b16aa3ebe";
    private Message parsedMessage;


    @Before
    public void setup() throws Exception {
        ByteArrayInputStream ins = new ByteArrayInputStream(INPUT_MAIL.getBytes());
        parsedMessage = new DefaultMessageBuilder().parseMessage(ins);

    }

    /**
     * Given the test message is a valid multipart message message with two parts. <br/>
     * When loading the message. <br/>
     * Then the parsed Message object should contain a multipart body with two parts.
     */
    @Test
    public void messageIsMultipartWhenLoaded() throws Exception {
        // The test message is a valid multipart message and is loaded just fine
        assertTrue(parsedMessage.getBody() instanceof Multipart);
        Multipart bm = (Multipart) parsedMessage.getBody();
        assertEquals(2, bm.getBodyParts().size());
    }

    /**
     * Given the test message is a valid multipart message with two parts. <br/>
     * When loading the message and storing it again.<br/>
     * Then the generated message should still be a multipart message with two parts.
     */
    @Test
    public void messageIsMultipartWhenSavedAndLoaded() throws Exception {
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        new DefaultMessageWriter().writeMessage(parsedMessage, out);
        /*
         * The multipart seperators were escaped: --=_cbaa2da1a4419c7f8c5c8f9b16aa3ebe became
         * --=3D_cbaa2da1a4419c7f8c5c8f9b16aa3ebe (the equals sign that was part of the seperator
got escaped).
         * Therefore the seperators are not recognized when parsing now.
         * 
         * Also: The mail headers of the sub parts were escaped aswell (see charset=...) :

         * Content-Type: text/plain; charset=3Diso-8859-1
         */
        System.out.println(new String(out.toByteArray()));
        ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());
        Message reparsedMessage = new DefaultMessageBuilder().parseMessage(in);

        assertTrue(reparsedMessage.getBody() instanceof Multipart);
        assertEquals(2, ((Multipart) reparsedMessage.getBody()).getBodyParts().size());
    }
    
    /**
     * Given the test message is a valid multipart message with two parts. <br/>
     * When loading the message and storing it (with fixed Message Writer) again.<br/>
     * Then the generated message should still be a multipart message with two parts.
     */
    @Test
    public void messageIsMultipartWhenSavedAndLoadedWithImprovedMessageWriter() throws Exception
{
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        new MimeMessageWriter(parsedMessage).write(out); 
        ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());
        Message reparsedMessage = new DefaultMessageBuilder().parseMessage(in);

        assertTrue(reparsedMessage.getBody() instanceof Multipart);
        assertEquals(2, ((Multipart) reparsedMessage.getBody()).getBodyParts().size());
    }
}
{code}


{code}
package com.ecg.replyts.mailparser.mime4j;

import java.io.IOException;
import java.io.OutputStream;
import java.util.Stack;

import org.apache.james.mime4j.codec.CodecUtil;
import org.apache.james.mime4j.dom.BinaryBody;
import org.apache.james.mime4j.dom.Body;
import org.apache.james.mime4j.dom.Entity;
import org.apache.james.mime4j.dom.Header;
import org.apache.james.mime4j.dom.Message;
import org.apache.james.mime4j.dom.MessageWriter;
import org.apache.james.mime4j.dom.Multipart;
import org.apache.james.mime4j.dom.SingleBody;
import org.apache.james.mime4j.dom.field.ContentTypeField;
import org.apache.james.mime4j.dom.field.FieldName;
import org.apache.james.mime4j.stream.Field;
import org.apache.james.mime4j.util.ByteArrayBuffer;
import org.apache.james.mime4j.util.ByteSequence;
import org.apache.james.mime4j.util.ContentUtil;
import org.apache.james.mime4j.util.MimeUtil;

/**
 * Copy from Mime4Js {@link org.apache.james.mime4j.message.DefaultMessageWriter}. This one's
content-transfer-encoding
 * approach is too naive and therefore breaks when writing multipart mails: <br/>
 * It will encode multipart seperators as well as mail headers from the dedicated parts of
that mail.
 * <p>
 * This implementation aims to do the content-transfer-encoding right.
 * <p>
 * <strong>TODO:</strong> How to handle nested Content-Transfer Encodings? E.g.
the mail specifies it self as
 * Quoted-Printable and all parts do aswell? Or: A mail in quoted-printable contains a mail
in base64. Should those
 * cases be escaped twice? For the first case, definately not (seen in may mails, that this
is not desired). But what
 * about the second case? Could not find exact spec for this.<br/>
 * Right now, we don't do nested encodings, but have an encodings stack, from which we only
take the top element. Thus,
 * every body is encoded in the right format. No nested encoding done.
 */
public class MimeMessageWriter {

    private static final byte[] CRLF = { '\r', '\n' };
    private static final byte[] DASHES = { '-', '-' };
    private final Message message;

    private enum BodyEncodig {
        BASE64, QUOTED_PRINTABLE, QUOTED_PRINTABLE_BINARY, NONE
    };

    private final Stack<BodyEncodig> bodyEncodigs = new Stack<MimeMessageWriter.BodyEncodig>();


    public MimeMessageWriter(Message message) {
        this.message = message;
    }

    public void write(OutputStream os) throws IOException {
        writeMessage(message, os);
        os.close();
    }

    private void writeBody(Body body, OutputStream out) throws IOException {
        if (body instanceof Message) {
            writeEntity((Message) body, out);
        } else if (body instanceof Multipart) {
            writeMultipart((Multipart) body, out);
        } else if (body instanceof SingleBody) {
            OutputStream encoded = encodeStream(this.bodyEncodigs.peek(), out);
            ((SingleBody) body).writeTo(encoded);
            if (encoded != out) {
                encoded.close();
            }
        } else
            throw new IllegalArgumentException("Unsupported body class");
    }

    private void writeEntity(Entity entity, OutputStream out) throws IOException {
        final Header header = entity.getHeader();
        if (header == null)
            throw new IllegalArgumentException("Missing header");

        writeHeader(header, out);

        final Body body = entity.getBody();
        if (body == null)
            throw new IllegalArgumentException("Missing body");


        bodyEncodigs.push(getEncoding(entity));
        writeBody(body, out);
        bodyEncodigs.pop();


    }

    private void writeMessage(Message message, OutputStream out) throws IOException {
        writeEntity(message, out);
    }

    private void writeMultipart(Multipart multipart, OutputStream out)
            throws IOException {
        ContentTypeField contentType = getContentType(multipart);

        ByteSequence boundary = getBoundary(contentType);

        // Ignore Epilogue. Dead MIME Mail Feature - there's no use to it.
        ByteSequence preamble = multipart.getPreamble() != null ? ContentUtil.encode(multipart.getPreamble())
: null;
        if (preamble != null) {
            OutputStream encoded = encodeStream(bodyEncodigs.peek(), out);
            writeBytes(preamble, encoded);
            if (encoded != out) {
                encoded.close();
            }
            out.write(CRLF);
        }

        for (Entity bodyPart : multipart.getBodyParts()) {
            out.write(DASHES);
            writeBytes(boundary, out);
            out.write(CRLF);

            writeEntity(bodyPart, out);
            out.write(CRLF);
        }

        out.write(DASHES);
        writeBytes(boundary, out);
        out.write(DASHES);
        out.write(CRLF);
    }

    private void writeField(Field field, OutputStream out) throws IOException {
        ByteSequence raw = field.getRaw();
        if (raw == null) {
            StringBuilder buf = new StringBuilder();
            buf.append(field.getName());
            buf.append(": ");
            String body = field.getBody();
            if (body != null) {
                buf.append(body);
            }
            raw = ContentUtil.encode(MimeUtil.fold(buf.toString(), 0));
        }
        writeBytes(raw, out);
        out.write(CRLF);
    }

    private void writeHeader(Header header, OutputStream out) throws IOException {
        for (Field field : header) {
            writeField(field, out);
        }

        out.write(CRLF);
    }


    protected OutputStream encodeStream(BodyEncodig enc, OutputStream os) throws IOException
{
        switch (enc) {
        case BASE64:
            return CodecUtil.wrapBase64(os);
        case QUOTED_PRINTABLE:
            return CodecUtil.wrapQuotedPrintable(os, false);
        case QUOTED_PRINTABLE_BINARY:
            return CodecUtil.wrapQuotedPrintable(os, true);
        case NONE:
            return os;
        default:
            throw new IllegalArgumentException(enc + " not supported");
        }
    }

    private BodyEncodig getEncoding(Entity e) {
        String cte = e.getContentTransferEncoding();
        if (MimeUtil.isBase64Encoding(cte)) {
            return BodyEncodig.BASE64;
        } else if (MimeUtil.isQuotedPrintableEncoded(cte)) {
            return e.getBody() instanceof BinaryBody ? BodyEncodig.QUOTED_PRINTABLE_BINARY
: BodyEncodig.QUOTED_PRINTABLE;
        }
        return BodyEncodig.NONE;
    }

    private ContentTypeField getContentType(Multipart multipart) {
        Entity parent = multipart.getParent();
        if (parent == null)
            throw new IllegalArgumentException(
                    "Missing parent entity in multipart");

        Header header = parent.getHeader();
        if (header == null)
            throw new IllegalArgumentException(
                    "Missing header in parent entity");

        ContentTypeField contentType = (ContentTypeField) header
                .getField(FieldName.CONTENT_TYPE);
        if (contentType == null)
            throw new IllegalArgumentException(
                    "Content-Type field not specified");

        return contentType;
    }

    private ByteSequence getBoundary(ContentTypeField contentType) {
        String boundary = contentType.getBoundary();
        if (boundary == null)
            throw new IllegalArgumentException(
                    "Multipart boundary not specified. Mime-Type: " + contentType.getMimeType()
+ ", Raw: " + contentType.toString());

        return ContentUtil.encode(boundary);
    }

    private void writeBytes(ByteSequence byteSequence, OutputStream out)
            throws IOException {
        if (byteSequence instanceof ByteArrayBuffer) {
            ByteArrayBuffer bab = (ByteArrayBuffer) byteSequence;
            out.write(bab.buffer(), 0, bab.length());
        } else {
            out.write(byteSequence.toByteArray());
        }
    }

}

{code}


removed source code from description ({code} tag is not supported in here, apparently). 
                
> Writing Multipart Mails with DefaultMessageWriter can produce unreadable mails
> ------------------------------------------------------------------------------
>
>                 Key: MIME4J-214
>                 URL: https://issues.apache.org/jira/browse/MIME4J-214
>             Project: JAMES Mime4j
>          Issue Type: Bug
>          Components: dom
>    Affects Versions: 0.7, 0.7.2
>         Environment: any, provable via Unit Test
>            Reporter: Matthias Huttar
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I've figured out, that the DefaultMessageWriter's content-transfer-encoding is a bit
too naive on handling multipart mails: If the mail headers define a content-transfer encoding,
the complete Multipart entity is encoded using content-transfer-encoding. This, tough, may
lead to unparseable mails: 
> - Mail headers in parts of the multipart will get encoded. e.g. {{Content-Type: text/plain;
charset=3Diso-8859-1}}
> - Multipart boundaries will get escaped. e.g. with quoted-printable and Content-Type:
{{multipart/mixed; boundary="=SIMPLEBOUNDARY="}}, will get encoded to =3DSIMPLEBOUNDARY=3D.
Which then cannot be parsed by major mail clients (Outlook, Thunderbird, some web mail clients,
as we've heard from some users). Mime4J BTW also is not able to parse that generated mail
correctly.  
> - We've heard from various sorts of weird issues with mails from our users, relating
this bug. (e.g. attachments get lost, charset is broken, mails are empty, entire mail body
is viewed as plaintext (showing the HTML source),...). However, I could not quite reproduce
all of those things with a unit test. 
> Below you can find a unit test exposing the issue and a solution proposal. Unfortunately,
this proposal does not implement the MessageWriter interface, as doing the content-transfer-encoding
right, required a serious amount of state to be transferred. I could not get this to work
right retaining the interface. 
> Unit test and bugfix for message writer is attached. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message