cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Milan <jmi...@DataChannel.com>
Subject RE: [Moving on] SAX vs. DOM part II
Date Wed, 26 Jan 2000 18:37:02 GMT


>> The DOM is an interface defintion that does NOT require you to load the
>> entire document into memory just as ODBC is an interface definition that
>> does NOT require you to load the entire database into memory.

> No, but the DOM *is* an object model, and the interface requires certain
> features, such as the ability to get a child count, to be a full DOM.
> Witness Xalan's DTM, which I would call a psuedo-DOM.  Because it is meant
> to be used incrementally, you can't get a child count, which is a major
> limitation to people wanting to do counter loops and the like.

Yes, but then what to make of DOM level II features like iterators and
filters?
Not very easy to get a child count on those either. Personally, I think
child
counts are evil, much like Content-Length is a *major* hassle, evidenced by
the Content-Length thread just gone by.

>> The DOM is an interface defintion that does NOT require you to load the
>> entire document into memory just as ODBC is an interface definition that
>> does NOT require you to load the entire database into memory.

> Notice that the base ODBC definition does not allow you to get row count.

One could argue ODBC is a better definition for what it does :) Again, I
think
child counts are tantamount to Content-Length issues. It's much better to
iterate or read content until the end is reached.

>> Now, it may not be *easy* to create a DOM with such an implementation,
>> but that is precisely what we are bringing to the table.

> I would be very interested in working with you guys on Xalan's Document
> Table Model (DTM), or something similar to replace it.

We're interested too.

>> Our implementation
>> enabled us to select records from a 10 gigabyte database, produce minimal
>> DOM structure, and transform that DOM via an XSL engine in not much more
>> time than it would take to retrieve results via a simple select
>> statement.
>> Accordingly, I don't believe a DOM approach necessarily has significant
>> impact on memory or speed considerations.

> Returning node lists of a fixed structure is a different thing than
> providing a full DOM tree that fullfills the standard interfaces.

What's a 'full DOM tree', really? Does a consumer of a DOM *really* need to
read gigabytes of data into memory before a DOM may be used? I think of
it as a late binding DOM-- we can dereference nodes on demand.

>> This might be easier with a SAX implementation in some cases, but other
>> cases, as you have to mention here, SAX actually makes it more difficult
>> by introducing "internal buffers."

> An internal tree pretty much always needs to be made for an XSLT
processor.
> The issue is, in the primary, performance-critical case, should the XSLT
> engine be required to use generic DOM interfaces.  The answer is: this is
> very problematic.

> If you look at the design of Xalan, I think you'll see that we are pretty
> much on the same page re the use of the DOM.  Xalan has always tried hard
> to be a DOM-neutral as possible.  But, the fact remains, it is
problematic.
> It is one thing to work with a special known DOM implementation... quite
> another to be able to work with any DOM.  And if you can't work with raw
> DOM interfaces, you're not really working with a DOM... you're working
with
> a proprietary tree structure.

Absolutely agree with this. In fact, full disclosure here, but we had to add
at least one method to our DOM-- the ability to selectNodes. However, it
seems 
to me the most problematic areas are more with data acquisition than with
transformation. Once you have defined a 'result set' DOM, current XSL and
XSLT
processors can handle it very well. In fact, it might even be a case where,
if you really wanted total w3c conformance, you could divide the realization
of
a DOM into two parts: data acquisition and representation. Most of the
issues of
a virtual DOM really revolve around the former. Most of the issues we've
discussed
in the SAX vs DOM debate focus on the latter.

This is somewhat of a 'latest thoughts' statement, but I'm wondering if the
data request model at all dictates the result model. But I digress.

>> For instance, if a
>> header and footer requested the same data, the SAX model would require
>> two separate events, while the virtual DOM could provide the same node.

> For the result tree??  I don't think this is true, as the DOM has no
> reference model, since each node has to point back up to it's parent (OK,
> you might be able to fake it with some really ugly tricks).

That's exactly what else is needed in a DOM, a reference model-- and
not just a pointer based one. In a nutshell, we've given each node an
address. This works really well with URIs too.

> (I want to keep on responding to all your good points, but it's getting
> late... unfortunately, this is one of my favorite subjects.)

>> In fact, as a whole, I think it would be much better to take an additive
>> approach. That is, maintain the current DOM interfaces and provide
>> additional SAX capabilities.

> The more I think about the issue in general, and your note, the more I
> agree with this.  I like this approach far more than forcing one or the
> other.  But it will complicate the architecture, and one should avoid
> translation of one model to the other.  Xalan currently does both, for all
> the reasons you name, so this is easy for me to say.  I still strongly
> maintain that for high-performance servers, SAX pipes are the only way to
> go.

I believe they (DOM and SAX) both need to be there. It may complicate
architecture,
or it may be the correct solution :) The general sentiment I've been reading
is
'DOM is good for this stuff, but SAX is good for that stuff. I wish I could
merge
the two somehow.' It seems possible to me, as I alluded to earlier, that
this
may be the result of trying to combine two conflicting models (data
acquisition vs data
presentation) into either a DOM only or a SAX only solution. My general
thesis is 
a DOM model serves best for data acquisition, caching and manipulation,
while 
the SAX model serves best for data transformation. Perhaps this can be
generalized
as DOM handles requests, and SAX handles responses.

Or this may all be wild conjecture :)

John Milan



Mime
View raw message