james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Burrell Donkin <robertburrelldon...@gmail.com>
Subject Re: Field, RawField, ParsedField and parsing methods
Date Mon, 28 Dec 2009 23:04:53 GMT
On Mon, Dec 28, 2009 at 9:27 PM, Stefano Bagnara <apache@bago.org> wrote:
> 2009/12/28 Robert Burrell Donkin <robertburrelldonkin@gmail.com>:
>> we've struggled to find the right balance between power, performance
>> and usability. IMHO we haven't yet succeeded.
> So we agree we can try to improve things even if this means breaking
> backward compatibility.

IIRC mime4j has broken compatibility with most releases - but care has
been taken to ensure that use cases are not needlessly sacrificed

>>> 1) We have a "Field" interface, a RawField and a ParsedField. Most
>>> code deal with generic Fields but knows when it is a parsedfield or a
>>> rawfield. Nowhere we check the Field implementation to understand if
>>> it is already parsed or not, so it seems we always know when it is a
>>> parsedfield and when it is a rawfield. Some code calling getName does
>>> a trim and a lowercase, some other code simply lowercase without
>>> trimming. Why don't we simply canonicalize things in getName and
>>> publish a clear contract about what getName returns?
>> IIRC performance (some downstream application don't care about
>> canonicalisation and don't want to pay the cost) and power (some
>> downstream apps require uncanonicalised input - this is a requirement
>> for round tripping in particular)
> I guess all of them simply use "getRaw", don't you think?

not necessarily

IMAP, jsieve and httpclient are downstream apps that mime4j needs to
support so it's usually worthwhile seeing if a feature is used by
these libraries before changing it.

> getName and getBody should not be use for roundtripping as they could
> change somethinh anyway (getBody is unfolded, so if you fold again you
> can't be sure you obtain the same result as you could end up folding
> in a different place).

on balance, i think a more fluent API with less downcasting would be
better anyway

>> it's important to remember that there are downstream applications that
>> use the methods and classes directly. so, even if a method does not
>> seem to be used in Mime4J, it may have been added to facilitate a
>> downstream use case. equally, it could be legacy. hard to tell since
>> everything's bundled up together.
> As we are still in 0.x releases and we agree that the exposed
> interfaces/code should be improved we should try to keep track of
> current downstream users and understand exactly what they need to do,
> so to use them as use-case to help us improving the separation of
> concerns. We don't want to expose every single class and to mantain
> backward compatibility for every single class, so we should start
> selecting things.
> IMHO if we are unable to collect downstream users we should try to
> decide on our own and maybe hide some unused method if we don't think
> it should be used outside, and maybe after releasing a new version
> (0.7) we'll wait for "upgraders" to complaint for the missing features
> and, if we find we really removed an useful feature we can add it
> again in the next release (0.8).


sacrificing features used by IMAP, JSieve and httpclient without
replacement is not a good idea

>>> As I fail to see the current "idea" maybe there is no idea and simply
>>> this is the result of too many hands and refactorings done in the
>>> years, so before being the next hand and applying the next refactoring
>>> I'd like to collect some thought.
>> IMO to satisfy so many use cases requires low level complexity. no
>> one's managed to come with a single idea that can satisfy all
>> requirements.
> We all know the XML parsers world. We have SAX, DOM, StAX, TraX, XOM
> (and also xml databinding apis), and so on.. there is no api to
> satisfy all users and none of them has been obsoleted by other. xml
> libraries usually expose one or more of that APIs but (AFAICT) none of
> them expose all of the interfaces in a single library.
> MimeTokenStream is our StAX parser
> MimeStreamParser is our SAX parser
> the "message" package is our DOM

the major types are pull (eg StaX), event (eg. SAX) and object model
(eg. DOM) and yes, most modern parsers support at least one API for
each (though some are internal). as with most XML modern parsers, the
basic MIME4J parser is pull with event and object model layered on

> in XML world SAX and StAX "events" are mainly based on Strings and at
> most on "QName".

StaX is not event driven and most XML parsers in Java try to avoid
string creation

> There are no Elements, Node at this level or anything
> "DOM" related, yet. In mime4j  (wrt streaming apis) we almost there:
> the model is pretty similar to the xml model, the main difference is
> our "Field" interface that is shared between our DOM and our S(t)AX.
> Talking about "copying" what XML did we know that we have to
> "compromise" on roundtripping (most XML apis out there let you read
> XML or alter XML, but they will loose most of the original formatting
> during the parsing)

the native (internal) APIs often allow direct access to the original
formatting. perhaps the Field problem could be solved by using a
fluent API which fully parses fields only on demand.

> IMHO our current SAX/StAX parser is almost OK and we should only
> improve naming, packages and maybe few other things like decide what
> to do with the "Field" interface.

i'm sure when you've spent more time with it, you'll find more
problems. the stream parser is powerful but configuration is not at
all intuitive and is required for advanced use cases.

in general, it's hard to understand what's low level and what's high
level. plus it's a beast to subclass or debug.

> In our DOM, instead, I see one big "defect" and it is that we don't
> have interfaces for some key nodes: we should add that MIME is a
> different beast than XML, but I think that we should try to model
> interfaces in a package and put there the Message as interface, each
> *Field as interfaces and then have some "builder" service to start a
> new Message from scratch or to parse it using SAX (and we already have
> the MessageBuilder)...
> What about creating interfaces for the DOM and split "Field" used by
> our S*AX by the "Field" used by our DOM ?

IIRC we took out a load of interfaces based on Field since they were
confusing so it's probably worth doing some design work before fitting
new ones...

i was wondering whether a fluent api and proper object model would be
better, exposing different levels of parse detail with lazy caching
but i won't have time to explore it...

>>> Do you think all I've written are foolish thoughts or do you think we
>>> should try to sort this stuff out before releasing mime4j 1.0 ?
>> IMO the API isn't stable or good enough for  a 1.0 release
>> some deep design decisions need to be taken about the library. without
>> the powerful but unintuitive features, mime4j can't be used for
>> downstream applications that require performance and power. perhaps
>> mime4j needs to be split into two libraries: a usable, intuitive API
>> for non-experts and a low level powerful, quick API for downstream
>> applications. this has worked for other applications.
> Don't you think that the "current" StAX+SAX+DOM approach works for MIME too?
> IMHO what is "unintuitive" is the way we try to implement them now
> (expecially the field parsing and the DOM handling)

there's too much magic in the pull parser, the event stream is neither
powerful nor intuitive (i end up using the pull parser all the time
but could explain the event interface in an easy way) and the object
model isn't complete enough

> e.g: we currently have again package dependecy cycles.. I know some of
> you couldn't care less of this, but I think that working without
> cycles and keeping a clear package dependency tree is the only way to
> produce an intuitive result. If I can't create a package tree, or a
> modules-tree, for an application then I can't understand or explain it
> "intuitively".

more thought's needed on that too :-/

- robert

View raw message