james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Burrell Donkin <robertburrelldon...@gmail.com>
Subject Re: [jira] Commented: (MIME4J-118) MIME stream parser handles non-ASCII fields incorrectly
Date Tue, 24 Feb 2009 21:23:45 GMT
On Tue, Feb 24, 2009 at 7:59 PM, Markus Wiederkehr
<markus.wiederkehr@gmail.com> wrote:
> On Tue, Feb 24, 2009 at 2:46 PM, Robert Burrell Donkin (JIRA)
> <mime4j-dev@james.apache.org> wrote:
>>    [ https://issues.apache.org/jira/browse/MIME4J-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676270#action_12676270
>> Robert Burrell Donkin commented on MIME4J-118:
>> ----------------------------------------------
>> I suspect that there may be longer term issues with this general approach but i think
we should accept that the current proposal is good enough for this release. release early,
release often.
> +1 on the release part but I need a few days to clean up that patch.


>> I think that the best way to approach is to preserve the original document together
with boundary meta-data. In other words, that a 'Content-Type' header starts at byte 99 in
the document rather than trying to slice up the document and re-assemble from lots of small
byte buffers. But this is related to other issues which should wait until after this release
so I think we should patch and look to ship.
> We can cross that bridge when we come to it but I don't particularly
> like the idea of having to open a file, seek to position 99 and read
> 50 bytes just to obtain the raw value of a Content-Type field, for
> example.

nio manages this quite adequately ;-)

i worry about the quantity of copying and new buffers that will need
to be created to store a single complex, large document when every
component has to be stored as a string and also as bytes to ensure
round tripping in non-compliant corner cases. i would much rather
encourage users to retain the original when absolute fidelity is

> Also please mind that Field instances may be shared between multiple
> messages and they can be created from a constructor or factory without
> an original document to back them up.

the difficult problems with round tripping should not occur when
fields are created programmatically

> And last but not least with nested encodings there is no meaningful
> offset into a file..

i'm not sure i agree with that

IIRC in a multipart document, the mime headers must be encoded in
ASCII. so, the first level headers can all be access through byte
offsets. a part may contain a transfer encoded document. there are a
couple of distinct cases which are interesting: when the document is
an embedded message or an embedded multipart document. when this is
encoded in Base64 then a bytewise offset is not available in the
original stream but is from the decoded stream. so, the bytewise
offset in the decoding stream can be used. this is a rare use case and
though the approach would be slow in this case, it would be a rare

- robert

View raw message