james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefano Bagnara (JIRA)" <mime4j-...@james.apache.org>
Subject [jira] [Commented] (MIME4J-116) Avoid duplicate parsing of header fields
Date Tue, 21 Jun 2011 10:38:47 GMT

    [ https://issues.apache.org/jira/browse/MIME4J-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052463#comment-13052463

Stefano Bagnara commented on MIME4J-116:

Hi Oleg, I finally had the time to review this code. I can't understand why FieldParser is
needed in the whole AbstractEntity/MimeEntity chain as it is only used once, just before the
MutableBodyDescriptor.addField() method call. As we know MutableBodyDescriptor is already
pluggable, why don't we simply leave the parsing job to this object? 

This way the FieldParser interface can still live in the dom package, together with all of
the remaining pluggable stuff regarding to field parsing and together with the "advanced"
MutableBodyDescriptor implementations.

In fact using the DefaultBodyDescriptor with the FieldParser doesn't currently make sense
because it will parse the field once in the Fieldparser and then ignore the parsed field data
by using, for example, the DefaultBodyDescriptor.parseContentType method that recreates a
RawField starting from the parsed field. 

If we move the "FieldParser" logic to the body descriptor then we make it more clear and we
move the code where it really is used. Also, this way the MutableBodyDescriptor.addField can
be better tied to the RawField object as we know it works on raw stuff (it doesn't make sense
to accept Field and then encapsulate non-RawField in new RawField, giving to the use a false
sense of optimization).

Also, moving FieldParser to dom will let us to change it signature from  "FieldParser<T
extends Field>" to a stricter "FieldParser<T extends ParsedField>".

I tried to generate a diff but it is hard to understand, I will try to put the change in a
branch so to better show what I mean and to let you review.

> Avoid duplicate parsing of header fields
> ----------------------------------------
>                 Key: MIME4J-116
>                 URL: https://issues.apache.org/jira/browse/MIME4J-116
>             Project: JAMES Mime4j
>          Issue Type: Improvement
>    Affects Versions: 0.6
>            Reporter: Markus Wiederkehr
>             Fix For: 0.7
> Currently some header fields are parsed twice when building a DOM. Once by DefaultBodyDescriptor
or MaximalBodyDescriptor and a second time by MessageBuilder using Field.parse().
> Also different parsers are used in both stages. The body descriptors use handcrafted
parsers whereas Field.parse uses JavaCC generated parsers. The handcrafted version does not
seem to handle comments in a header correctly.
> The situation should be improved by parsing a header field only once and passing that
already parsed field to a content handler. Also only one sort of field parser should be used;
either handcrafted or generated. My personal opinion is that it might be easier for a handcrafted
parser to be more tolerant against malformed header fields.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message