cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [RT] the quest for the perfect template language
Date Thu, 10 Apr 2003 08:59:48 GMT
on 4/9/03 10:40 PM Hunsberger, Peter wrote:

> Stefano Mazzocchi <stefano@apache.org> asked:
> 
> 
>>So, this list seems full of XSLT lovers, then get your brain cells
>>working: how do we sort the performance issues of document()?
>>
> 
> 
> I realize this isn't what you're asking for, but the following came up on
> xml-dev the other day:
> 
> http://www-106.ibm.com/developerworks/xml/library/x-injava/index.html
> 
> It's a comparison of the performance of various parsers.  Interestingly
> enough (when considered with some of the other discussion in this thread) a
> pull model parser (XPP) comes out on top most of the time.

damn, you spoiled my future RT about pulling vs. pushing pipelines :-)

> Isn't the document issue really attacked by treating it exactly as any other
> *internal* Cocoon URI reference (via the URI resolver hook)? 

The problem is that you are pulling data from a stream that gets pushed
to you.

This is the same impedence mismatch of JSP/velocity as generators where
a parser needs to be placed in between and performance is degraded
compared to a native-sax push-oriented generation stage which is
directly connected to the pipe.

If you do something like

  document(cocoon://whatever#//blah[foo='bar'])

you have to consume *ALL* the SAX events that are given to you by the
underlying URI.

It would be like performing "select * from customers where name =
'stefano'" by having the entire table dumped to you one row at a time
and simply discard all you don't need!

This is where pull parsing would really rock, the problem is that such
pull parsing is, in fact, a small xml database.

And there might be a pretty big overhead in creating a small database
(say, the equivalent of Xalan DTM or even that one) in order to
facilitate indexing.

But maybe, that's exactly what Xalan does internally for the document()
function, I really don't know.

Still, my point remains: the underlying amount of work the system has to
do to come out with a simple variable using document() is incredible
compared to the use of a simple method call of a taglib.

document() keeps on looking like a golden hammer antipattern to me.

I think it would make perfect architectural sense as an interface to
access a real xml database, but for accessing something like an xml-ized
representation of session content, well, I'm not sure.

Still I see one big value in this: usability during development. It's
nice to divide your problem into different pipelines because you can
reuse them and tune them as you go and look at them in your browser
directly (or with views).

this is admittedly very attractive.

but I'm thinking than a jxpath-transformer alternative could well be
better.... even if, at that point, the similarity between the jxpath
syntax and xslt forces to do stuff like

 <img src="{id}/{string('{id}')}"/>

so that the first {id} represents the value of the 'id' element of the
input stream of events, and the second one is escaped and further
processed by the jxpath transformer which is pipelined after the xslt one.

Still, even the jxpath has the pretty nasty problem of having to iterate
over the whole stream of events to find out which one to substitute.
Another performance problem, expecially for namespaced attributes which
are very slow to process in SAX since they are not sent as events.

I really don't know, I think that, at this point, we need numbers to
know what's really going on. numbers that compare an XSLT/document()
approach against a jxpath approach.

Anybody wants to volunteer to benchmark this ;-)

-- 
Stefano.



Mime
View raw message