james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Kalnichevski (JIRA)" <mime4j-...@james.apache.org>
Subject [jira] [Commented] (MIME4J-116) Avoid duplicate parsing of header fields
Date Thu, 21 Apr 2011 14:52:05 GMT

    [ https://issues.apache.org/jira/browse/MIME4J-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022770#comment-13022770
] 

Oleg Kalnichevski commented on MIME4J-116:
------------------------------------------

It appears that this issue can only be resolved by moving Field and FieldPaser interfaces
from DOM to Core. If we want to keep a very strict separation of responsibilities between
Core and DOM (Core deals with RawFields only whereas DOM is responsible for parsing raw fields
into complex structured fields) _some_ duplication of field parsing seems unavoidable. Core
parser needs content-type, content-transfer-encoding, charset and boundary bits in order to
be able to decode mime entities. This can also lead to potential inconsistencies in handling
of malformed fields (as the one recently reported by Stefano): the default message builder
may succeed in building an object model for a particular message, but the default message
formatter may fail when serialising the very same model, because some fields get re-parsed
using a stricter routine.

If we did move Field and FieldPaser interfaces to Core, however, not only could we avoid duplicate
parsing of some headers, but we could also potentially simplify the API by getting rid of
RawField class and potentially Maximal/DefaultBodyDescriptors. All fields would get a parser
assigned to them as soon as they are read  from the MIME stream and would only need to be
parsed once when accessed (if at all). Body descriptors could also be built lazily from properties
of individual fields. They would no longer be a reason for having reduced (default) body descriptors
and maximal ones. 

If I hear no objections, I'll go ahead and experiment with the idea of moving field parsing
interfaces to Core.

Oleg

> Avoid duplicate parsing of header fields
> ----------------------------------------
>
>                 Key: MIME4J-116
>                 URL: https://issues.apache.org/jira/browse/MIME4J-116
>             Project: JAMES Mime4j
>          Issue Type: Improvement
>    Affects Versions: 0.6
>            Reporter: Markus Wiederkehr
>             Fix For: 0.7
>
>
> Currently some header fields are parsed twice when building a DOM. Once by DefaultBodyDescriptor
or MaximalBodyDescriptor and a second time by MessageBuilder using Field.parse().
> Also different parsers are used in both stages. The body descriptors use handcrafted
parsers whereas Field.parse uses JavaCC generated parsers. The handcrafted version does not
seem to handle comments in a header correctly.
> The situation should be improved by parsing a header field only once and passing that
already parsed field to a content handler. Also only one sort of field parser should be used;
either handcrafted or generated. My personal opinion is that it might be easier for a handcrafted
parser to be more tolerant against malformed header fields.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message