james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Burrell Donkin <robertburrelldon...@gmail.com>
Subject Re: Field, RawField, ParsedField and parsing methods
Date Mon, 28 Dec 2009 14:19:26 GMT
On Mon, Dec 28, 2009 at 12:02 PM, Stefano Bagnara <apache@bago.org> wrote:
> Hi all,
> I've just reviewed some mime4j code about parsing and I find the
> current behaviour not so intuitive.

i suspect most people agree on this :-)

we've struggled to find the right balance between power, performance
and usability. IMHO we haven't yet succeeded.

> 1) We have a "Field" interface, a RawField and a ParsedField. Most
> code deal with generic Fields but knows when it is a parsedfield or a
> rawfield. Nowhere we check the Field implementation to understand if
> it is already parsed or not, so it seems we always know when it is a
> parsedfield and when it is a rawfield. Some code calling getName does
> a trim and a lowercase, some other code simply lowercase without
> trimming. Why don't we simply canonicalize things in getName and
> publish a clear contract about what getName returns?

IIRC performance (some downstream application don't care about
canonicalisation and don't want to pay the cost) and power (some
downstream apps require uncanonicalised input - this is a requirement
for round tripping in particular)

perhaps a more expressive interface might work better but would
probably require more object creation

> 2) We have a FieldParser interface (that I just generified to
> FieldParser<T extends ParsedField>) that create ParsedField instances
> starting from String name, String body, ByteSequence raw. This seems
> "weird" to me. name and body are already a parsing step beyond raw.
> "FieldParser" is implemented by fields implementation and always call
> the constructor for the field with the same 3 parameters. All of the
> fields, in facts, simply parse "body" and bring around "raw" only as
> convenience way to avoid recreating (reencoding, refolding) stuff when
> writing to a mime stream.
> We have 3 callers for FieldParser.parse method:
> A. Fields.parse(name, body): this method generate a raw sequence
> before parsing (ContentUtil.encode(MimeUtil.fold(fieldname+":
> "+fieldbody))).
> B. DefaultFieldParser.parse(ByteSequence raw, String rawStr): this
> method, instead, generate name and body starting from raw (So, the
> same functionality as RawField, but implemented differently).
> C. DelegatingFieldParser.parse : this is simply a delegator, called in
> the above 2 places (A&B).
> The DefaultFieldParser.parse method is always called via
> DefaultFieldParser.parse(ByteSequence) method, and this method is
> called by MessageBuilder and SimpleContentHandler field methods (they
> are our 2 implementations of ContentHandler). So, the
> ContentHandler.field(Field) method is called by the
> MimeStreamParser.parse(InputStream) method. Another weird thing: the
> MimeStreamParser already do some parsing, because it checks for the
> ":" and build a RawField from the ByteArrayBuffer and the colon
> position. So:
> - AbstractEntity.parseField parse a ByteArrayBuffer to find the ":"
> and build a RawField.
> - MimeStreamParser calls the ContentHandler passing the RawField as a
> generic "Field"
> - Both ContentHandlers we distribute runs a
> DefaultFieldParser.parse(field.getRaw) looking again for the ":" from
> "raw" data and returning a ParsedField under the Field interface.
> Isn't it weird that we work with Field interface, RawField
> implementation but we simply "drop" any information but the
> ByteSequence and parse the field again from scratch (getRaw) in order
> to return "again" a Field (that this time is ParsedField but we don't
> tell this to anyone)?
> Also AbstractField tries to be optimized as it use final name/body/raw
> fields. So AbstractFields is an immutable bean, but in practice, every
> extension of AbstractField uses lazy parsing (exept the
> ContentTransferEncodingField) so that it is functionally immutable but
> not really immutable for the jvm. IMO we should make a choice and
> embrace it. In order to alter the headers currently an user is
> expected to parse it and then to create a new raw representation of
> the new header and parse it to generate the new header to replace the
> old one. Either we make the fields mutable so to make it easier to
> alter them, or we make them immutable for real, so the JVM optimize
> their use, don't you agree? (I mean that we could introduce an
> interface for each field, and have 2 implementations for each of them:
> one parsed and one raw, maybe referencing the underlying RawField. The
> raw one could even be a simply proxy that encapsulate the RawField and
> create an instance of the parsed field only when a method is
> requested).

it's important to remember that there are downstream applications that
use the methods and classes directly. so, even if a method does not
seem to be used in Mime4J, it may have been added to facilitate a
downstream use case. equally, it could be legacy. hard to tell since
everything's bundled up together.

> 3) I fail to understand what is the contract between mime4j library
> and its users wrt to fields and fieldsparsing.
> Let say I have to parse a ContentType: am I expected to call
> DefaultFieldParser.parse ? Fields.contentType and other methods?

whichever is most appropriate for the use case

> As I fail to see the current "idea" maybe there is no idea and simply
> this is the result of too many hands and refactorings done in the
> years, so before being the next hand and applying the next refactoring
> I'd like to collect some thought.

IMO to satisfy so many use cases requires low level complexity. no
one's managed to come with a single idea that can satisfy all

> Do you think all I've written are foolish thoughts or do you think we
> should try to sort this stuff out before releasing mime4j 1.0 ?

IMO the API isn't stable or good enough for  a 1.0 release

some deep design decisions need to be taken about the library. without
the powerful but unintuitive features, mime4j can't be used for
downstream applications that require performance and power. perhaps
mime4j needs to be split into two libraries: a usable, intuitive API
for non-experts and a low level powerful, quick API for downstream
applications. this has worked for other applications.

- robert

View raw message