james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bagnara <apa...@bago.org>
Subject Re: Field, RawField, ParsedField and parsing methods
Date Mon, 28 Dec 2009 21:27:46 GMT
2009/12/28 Robert Burrell Donkin <robertburrelldonkin@gmail.com>:
> we've struggled to find the right balance between power, performance
> and usability. IMHO we haven't yet succeeded.

So we agree we can try to improve things even if this means breaking
backward compatibility.

>> 1) We have a "Field" interface, a RawField and a ParsedField. Most
>> code deal with generic Fields but knows when it is a parsedfield or a
>> rawfield. Nowhere we check the Field implementation to understand if
>> it is already parsed or not, so it seems we always know when it is a
>> parsedfield and when it is a rawfield. Some code calling getName does
>> a trim and a lowercase, some other code simply lowercase without
>> trimming. Why don't we simply canonicalize things in getName and
>> publish a clear contract about what getName returns?
> IIRC performance (some downstream application don't care about
> canonicalisation and don't want to pay the cost) and power (some
> downstream apps require uncanonicalised input - this is a requirement
> for round tripping in particular)

I guess all of them simply use "getRaw", don't you think?
getName and getBody should not be use for roundtripping as they could
change somethinh anyway (getBody is unfolded, so if you fold again you
can't be sure you obtain the same result as you could end up folding
in a different place).

> it's important to remember that there are downstream applications that
> use the methods and classes directly. so, even if a method does not
> seem to be used in Mime4J, it may have been added to facilitate a
> downstream use case. equally, it could be legacy. hard to tell since
> everything's bundled up together.

As we are still in 0.x releases and we agree that the exposed
interfaces/code should be improved we should try to keep track of
current downstream users and understand exactly what they need to do,
so to use them as use-case to help us improving the separation of
concerns. We don't want to expose every single class and to mantain
backward compatibility for every single class, so we should start
selecting things.

IMHO if we are unable to collect downstream users we should try to
decide on our own and maybe hide some unused method if we don't think
it should be used outside, and maybe after releasing a new version
(0.7) we'll wait for "upgraders" to complaint for the missing features
and, if we find we really removed an useful feature we can add it
again in the next release (0.8).

>> As I fail to see the current "idea" maybe there is no idea and simply
>> this is the result of too many hands and refactorings done in the
>> years, so before being the next hand and applying the next refactoring
>> I'd like to collect some thought.
> IMO to satisfy so many use cases requires low level complexity. no
> one's managed to come with a single idea that can satisfy all
> requirements.

We all know the XML parsers world. We have SAX, DOM, StAX, TraX, XOM
(and also xml databinding apis), and so on.. there is no api to
satisfy all users and none of them has been obsoleted by other. xml
libraries usually expose one or more of that APIs but (AFAICT) none of
them expose all of the interfaces in a single library.

MimeTokenStream is our StAX parser
MimeStreamParser is our SAX parser
the "message" package is our DOM

in XML world SAX and StAX "events" are mainly based on Strings and at
most on "QName". There are no Elements, Node at this level or anything
"DOM" related, yet. In mime4j  (wrt streaming apis) we almost there:
the model is pretty similar to the xml model, the main difference is
our "Field" interface that is shared between our DOM and our S(t)AX.
Talking about "copying" what XML did we know that we have to
"compromise" on roundtripping (most XML apis out there let you read
XML or alter XML, but they will loose most of the original formatting
during the parsing)

IMHO our current SAX/StAX parser is almost OK and we should only
improve naming, packages and maybe few other things like decide what
to do with the "Field" interface.

In our DOM, instead, I see one big "defect" and it is that we don't
have interfaces for some key nodes: we should add that MIME is a
different beast than XML, but I think that we should try to model
interfaces in a package and put there the Message as interface, each
*Field as interfaces and then have some "builder" service to start a
new Message from scratch or to parse it using SAX (and we already have
the MessageBuilder)...

What about creating interfaces for the DOM and split "Field" used by
our S*AX by the "Field" used by our DOM ?

>> Do you think all I've written are foolish thoughts or do you think we
>> should try to sort this stuff out before releasing mime4j 1.0 ?
> IMO the API isn't stable or good enough for  a 1.0 release
> some deep design decisions need to be taken about the library. without
> the powerful but unintuitive features, mime4j can't be used for
> downstream applications that require performance and power. perhaps
> mime4j needs to be split into two libraries: a usable, intuitive API
> for non-experts and a low level powerful, quick API for downstream
> applications. this has worked for other applications.

Don't you think that the "current" StAX+SAX+DOM approach works for MIME too?
IMHO what is "unintuitive" is the way we try to implement them now
(expecially the field parsing and the DOM handling)

e.g: we currently have again package dependecy cycles.. I know some of
you couldn't care less of this, but I think that working without
cycles and keeping a clear package dependency tree is the only way to
produce an intuitive result. If I can't create a package tree, or a
modules-tree, for an application then I can't understand or explain it


> - robert

View raw message