xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Bray <tb...@textuality.com>
Subject Re: parser-next-gen goals, plan, and requirements
Date Thu, 13 Jul 2000 00:33:14 GMT
At 05:49 PM 11/07/00 -0700, Arnaud Le Hors wrote:
>At the minimum we need to have the same as Xerces 1. These are:
>
>Validating XML 1.0
>Namespaces
>SAX2
>DOM Level 2
>XML Schemas
>
>In addition, I guess it's a given that we all want:
>
>Modularity, meaning that one should be able to have a jar containing the
>bare minimum XML parser for instance.

I think this is really important.  There is going to be some proportion of 
the time N, where the parser is just pulling out elements and attributes,
not validating or XPathing or DOMbuilding.  Nobody knows what N is but my
guess it's going to be surprisingly high, like "most of the time".  This 
kind of parsing needs to be fast and it needs to have a light memory
footprint.

Question: if you build a low-level parser that 

(a) implements SAX2, and
(b) if asked parses the DTD and stuffs it into reasonable java data
    structure,

can you build all the other pieces that Arnaud lists on top of that and 
have acceptable efficiency?  

I don't know how representative I am, but for me, validation (at either
DTD or schema level) is mostly for debugging; at runtime the validation
logic tends to be hardwired and app-specific.  Thus I'd be willing to
trade quite a lot of validation performance for fast SAX2 events and
a light memory footprint. 

My intuition says that as regards building the tree and all that follows
from it, making that go through SAX2 shouldn't be a performance hit... or
are there other experiences.

Wild-eyed suggestion: why not look into adopting James Clark's XT?  It's
a pretty #!%@^#@ good parser IMHO.  Also it's from neither IBM or Sun :)
 -Tim


Mime
View raw message