james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefano Bagnara (JIRA)" <mime4j-...@james.apache.org>
Subject [jira] Commented: (MIME4J-58) Lenient dealing with headless messages or malformed header/body separation
Date Thu, 31 Dec 2009 00:21:29 GMT

    [ https://issues.apache.org/jira/browse/MIME4J-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795503#action_12795503
] 

Stefano Bagnara commented on MIME4J-58:
---------------------------------------

So, I managed to alter LineReaderInputStream to allow "unread" of a ByteArrayBuffer. Only
BufferedLineReaderInputStream supports unreading and it should be enough as we always have
that class while reading headers.

The implementation is a bit tricky and would probably easily fail, but I added some runtime
exception to protect from most obvious bugs: it's never raised parsing our test messages,
so it should be ok.

The idea is that parseField() when fills a new fieldBuffer try to parse it in a header and
if it fails it can stream.unread(fieldBuffer) and return "false" so that the token parser
will run END_HEADERS and parse the same buffer again as the beginning of the content.

So, the feature works, but I'm not sure wether this should be the default behaviour, if it
should be an optional behaviour, or it simply should be the only behaviour and the old one
should be removed at all.

What do you think?

PS: simply speaking the implemented behaviour is "only valid header are allowed" (valid-chars-for-the-name
*WSP ":"), the first line that does not start with a valid header name, *WSP, ":" is considered
already part of the body (a virtual CRLF is inserted just before it).


> Lenient dealing with headless messages or malformed header/body separation
> --------------------------------------------------------------------------
>
>                 Key: MIME4J-58
>                 URL: https://issues.apache.org/jira/browse/MIME4J-58
>             Project: JAMES Mime4j
>          Issue Type: Task
>    Affects Versions: 0.3
>            Reporter: Stefano Bagnara
>             Fix For: 0.8
>
>         Attachments: headerbody-nocrlfcrlf.msg, headerbody-noheader.msg
>
>
> Define how to deal with non canonical messages like this one:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> In the first case mime4j output twice an  "invalid header" error and a roundtrip write
result in an empty message.
> In the SMTP case this is unfortunate because sometimes it happens messages are sent without
header.
> In the second case mime4j currenlty take Subject and AnotherHeader as headers and "This
is an invalid header" raise a monitor for "invalid header" and "Body text" is considered the
body.
> A compromise we evaluated in past between compliance, leniency and performace was to
"alter" the requirement for CRLFCRLF between headers and body with a different rule: if during
parsing of the headers we find a line (not multiline) and not including an "HeaderName: something"
then we virtually add a CRLF *before* that line and consider that line the first line of the
body. This allow us to only buffer a single line (as opposite to parsing the whole message
in search of a CRLFCRLF and consider the full message a body if no CRLFCRLF is found) and
to be very lenient with input. The "side effect" (maybe not bad) is that a wrong header in
the middle of headers will result in some headers moved to the body.
> With this algorythm the above would be "virtually" parsed as it was:
> -----------------------
> This is a simple message not having headers.
> The whole text should be recognized as body.
> -----------------------
> or this one:
> -----------------------
> Subject: this is a subject
> This is an invalid header
> AnotherHeader: is this an header or the first part of the body?
> Body text
> -----------------------
> If we think about strict and lenient approaches I think that current mime4j result is
ok when using a strict parsing, while the one I propose is a good lenient alternative.
> Opinions? Alternatives?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message