james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Wiederkehr <markus.wiederk...@gmail.com>
Subject Duplicate parsing of some header fields
Date Fri, 30 Jan 2009 19:37:08 GMT
When Mime4j is used to build a DOM some header fields are parsed more
than once..

This is what happens:
 * AbstractEntity.parseField parses the raw string into a name-value pair
 * AbstractEntity.parseField calls MutableBodyDescriptor.addField for
valid fields
 * DefaultBodyDescriptor parses header fields such as Content-Type
 * MaximalBodyDescriptor parses additional header fields such as Mime-Version
 * eventually MimeStreamParser.parse retrieves the raw field from
AbstractEntity and notifies a ContentHandler
 * the ContentHandler (again) has to parse the raw string into name and value
 * the ContentHandler (again) has to parse certain structural fields
already parsed by a body descriptor

In case of building a DOM the latter two items are done by
MessageBuilder by calling Field.parse().

There are several issues here:
 * the ContentHandler has to do a lot of work that has already been
done before by AbstractEntity.
 * Field.parse() uses a JavaCC-generated parser whereas the body
descriptors have their own parsing code.
 * we have unnecessary duplicate code and the two parsers are very
likely inconsistent with each other.
 * parsing issues have to be addressed twice.

To resolve these problems I would like to propose a drastic change to the API:
 * AbstractEntity should use Field.parse to parse the raw field
 * The body descriptor can extract information from the Field
instances without need of their own parsing code
 * A ContentHandler would be notified of a Field instead of a string
 * MessageBuilder could simply store the field and would not have to
parse it again.

Please note that with the recent changes to the API all concrete Field
classes parse the field value lazily so there should be no significant
performance impact.

I also think this could help with MIME4J-69..


View raw message