james-mime4j-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bagnara <apa...@bago.org>
Subject Re: Field, RawField, ParsedField and parsing methods
Date Mon, 28 Dec 2009 23:42:03 GMT
2009/12/29 Robert Burrell Donkin <robertburrelldonkin@gmail.com>:
> On Mon, Dec 28, 2009 at 9:27 PM, Stefano Bagnara <apache@bago.org> wrote:
>> IMHO if we are unable to collect downstream users we should try to
>> decide on our own and maybe hide some unused method if we don't think
>> it should be used outside, and maybe after releasing a new version
>> (0.7) we'll wait for "upgraders" to complaint for the missing features
>> and, if we find we really removed an useful feature we can add it
>> again in the next release (0.8).
>
> -1
>
> sacrificing features used by IMAP, JSieve and httpclient without
> replacement is not a good idea

I said "if we are unable to collect downstream users". This means that
we *should* check every downstream user we *know* use-case but the
only way to "ping" unknown downstream users is to simply ignore them
and wait for them to come here complaining. So, I will collect jsieve,
imap and httpclient usages, but I don't know what other downstream
users are around.

>> in XML world SAX and StAX "events" are mainly based on Strings and at
>> most on "QName".
>
> StaX is not event driven and most XML parsers in Java try to avoid
> string creation

I don't understand what you say. AFAIK Stax and SAX only give you
access to QName and Strings.
(most methods from stream.XMLStreamReader returns Strings,
sax.ContentHandler callback methods pass Strings). I'm not saying that
they are optimized or that we should use string, but I really don't
understand your statement.

>> There are no Elements, Node at this level or anything
>> "DOM" related, yet. In mime4j  (wrt streaming apis) we almost there:
>> the model is pretty similar to the xml model, the main difference is
>> our "Field" interface that is shared between our DOM and our S(t)AX.
>> Talking about "copying" what XML did we know that we have to
>> "compromise" on roundtripping (most XML apis out there let you read
>> XML or alter XML, but they will loose most of the original formatting
>> during the parsing)
>
> the native (internal) APIs often allow direct access to the original
> formatting. perhaps the Field problem could be solved by using a
> fluent API which fully parses fields only on demand.

This doesn't happen with XML. I worked with them a lot, and
SAX/StAX/DOM doesn't let you alter an input XML file while preserving
formatting, comments, CDATA and so on. BTW, currently in mime4j there
is no way to alter the stream while keeping all of the formatting
stuff. As an example both CRLF and LF are used as header line
terminations but they are removed from the line as soon as they are
read, so when you output that header you don't know if it was LF or
CRLF terminated. I'm not sure that it worth allowing
malformed/non-canonical mime messages to be altered preserving
malformation (the header EOL is just one of many issues I can list).

>> IMHO our current SAX/StAX parser is almost OK and we should only
>> improve naming, packages and maybe few other things like decide what
>> to do with the "Field" interface.
>
> i'm sure when you've spent more time with it, you'll find more
> problems. the stream parser is powerful but configuration is not at
> all intuitive and is required for advanced use cases.

I've spent time, and I'm sure I'll find more issues, but in the
pull/push parsers we have much less entry points so it should be
easier to deal with them. BTW we'll discuss this when I'll have change
proposals. No need to discuss about nothing ;-)

> in general, it's hard to understand what's low level and what's high
> level. plus it's a beast to subclass or debug.

True.

>> In our DOM, instead, I see one big "defect" and it is that we don't
>> have interfaces for some key nodes: we should add that MIME is a
>> different beast than XML, but I think that we should try to model
>> interfaces in a package and put there the Message as interface, each
>> *Field as interfaces and then have some "builder" service to start a
>> new Message from scratch or to parse it using SAX (and we already have
>> the MessageBuilder)...
>>
>> What about creating interfaces for the DOM and split "Field" used by
>> our S*AX by the "Field" used by our DOM ?
>
> IIRC we took out a load of interfaces based on Field since they were
> confusing so it's probably worth doing some design work before fitting
> new ones...

My proposal is to introduce interfaces for objects that are part of
the DOM. Then the DOM should expose "Mutation" methods. So, if you
have to add an header the DOM node for the Header will have to expose
methods to add one without having to rely on direct access to parser
classes or field implementations.

> i was wondering whether a fluent api and proper object model would be
> better, exposing different levels of parse detail with lazy caching
> but i won't have time to explore it...

Does "proper object model" includes interface for each node object?
Otherwise can you make an example of "fluent api and proper object
model"?

Stefano

Mime
View raw message