abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James M Snell <jasn...@gmail.com>
Subject Re: Axiom
Date Wed, 18 Oct 2006 18:33:15 GMT

Garrett Rooney wrote:
> [snip]
>  1) The ability to have extensions that don't peek into the parser
> implementation.
>  2) The ability to do async parsing (i.e. a getFirstSibling() on a
> node would have the option of returning "nope, ain't got that data
> yet".

To these I would add...

3.  Incremental parsing.  The input stream needs to be consumed
4.  Ability to preserve the complete XML infoset of input
5.  Ability to use XPath to navigate the structure
6.  Ability to use a streaming API for reading and writing feeds without
    building up an object model
7.  Automatically handle base64 encoding/decoding in content elements
8.  Ability to filter out unwanted parse events
9.  Automatic detection of character set encoding
10. Fewer dependencies.
11. Ability to parse multiple document formats (Atom feed and entry
    docs, APP service and category docs, any arbitrary xml to support
    flexible entry content options)
12. Minimal changes to the core model APIs
13. toString() on the elements must return serialized XML

> Note that option 1 can naturally be made to go away by dropping the
> concept of having multiple parser back ends.  I don't know if that's
> something we want to do, but it might be.  I know that James has
> mentioned the existence of alternate back ends within IBM, my real
> question is what is the motivation for the existence of such things.
> Are there problems they are trying to solve by using an alternate
> parser that we could solve in our default?

At this point I think I'd be fine with moving away from the concept of
allowing alternative backend parsers.  I still think we need to maintain
a clear separation between the model interfaces and the implementation,
but our code can be greatly simplified if we just stick with one way of
doing things.  Others can choose to write up alternative impls of the
APIs if they wish, but that's not our concern.

> Alternatively, we'd need our model to expose more information, so that
> we can do things like "insert new node into the tree here" without
> having to dip into the guts of the parser.  This would allow extension
> nodes to live as full citizens alongside nodes that we create via the
> parser.

I see the new stuff operating in two different modes.

1. Streaming.  No object model would be maintained.  Just a stream of
parse events.

2. Infoset. An infoset object model would be built up.  This is
equivalent to what we have currently and, like you said, would need to
support a bit more than we currently allow.

- James

View raw message