xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@activemath.org>
Subject Re: Progressive parsing
Date Mon, 26 Aug 2002 19:39:26 GMT
First, thanks to Neil for his answer (on Xerces-j-user list) which I 
don't find anymore to quote appropriately.
Here is an attempt of a solution that looks pretty much to satisfy my 
needs but for which I'd have some more comments on the quality of the 
approach.

To re-parse (or parse) a single element (and its child content), it 
seems sufficient to have the following information: the URL of the 
document, the byte-positions (start and end) of the whole element, and 
the byte-positions of the all the parents element-start declarations 
(and be able to feed the corresponding closing elements).
This could be easily piped through a stream, reading only the necessary 
bits and skipping the rest, thereby feeding to the parser only the 
needed things.

Here's an example:
<a>    <b><c>blop</c></b>     <b id="b1"><c>blip</c></b>
     </a>

To reparse only the content of b of id "b1" I can then feed to the 
parser:
			<a><b id="b1"><c>blip</c></b></a>
thus avoiding the presumabily enormous first b element's content.
(note, this doesn't mention what the parsing is actually, feeding, I am 
thinking of JDOM but one's free, just... sax events).

I see at least two applications of this:

- an xml source editor that has, say, a tree-view, could reparse much 
less thereby being much more responsive (try jEdit's excellent xml-mode, 
the parsing step is heavy!).

- to make poor-man's (read-only) database of xml-content, it would be 
sufficient to build an index of the elements with an id which would then 
be fed responding to a query

But is this good xml practice ?
I am clearly loosing the ability to apply full-validation (that is, I 
could only revalidate the element's content, is schema exchangeable in 
terms of root element like a DTD is ? relax-ng schemas ?)

Finally... to xerces makers/users: how do I get the byte position of an 
element declaration I've just been handed to by the sax parser ?

Thanks.

Paul



On Jeudi, juillet 25, 2002, at 02:58 , Paul Libbrecht wrote:
> Although this request only about parsing, I think it looks to be 
> general enough to be posted in this list.
>
> Here's a simple problem: one of our applications reads a row of XML 
> documents, all using the same DTD declarations. If I understand well, 
> at least from the SAX or JAXP interfaces, the parser will read the 
> DTD(s) completely everytime.
> This looks like a real resource loss. Do some parsers, and preferably a 
> standard, have a way to avoid this and re-use the same parsed DTD 
> everytime ??
>
>
> A related fact is in the building of an XML editor where you offer the 
> user the ability to edit the source code: what you would like is that 
> the internal XML representation becomes updated quickly (ideally all 
> the time). For this, however, we would need the parser to be able to 
> only parse, say, the biggest element containing the changes.
> And for this, some more information should be kept, at least something 
> similar to a stack of namespaces for each location.


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message