cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: [RT] Access to the object model in XTTL
Date Sat, 12 Apr 2003 16:34:49 GMT
on 4/10/03 3:46 PM Daniel Fagerstrom wrote:


>    In the discussion about XTTL (eXtensible Transformation and Template 
> Language, i.e. using XSLT with nicer syntax as template language in 
> Cocoon), the access of the Cocoon object model (OM), seem to be the 
> major design challenge. So I think it might be a good idea to take a 
> closer look at the issues.

I've taken a pretty serious look at XQuery and I think that it fits the
needs for what I wanted. Ivelin wrote a pretty nice article on it for
xml.com, you might want to check it out.

> We start by discussing what data that should be accessible from XTTL and 
> how it could be packaged, and then continue into the more technically 
> involved  aquestionsbout how the XSLT processor can access the data in 
> an efficient way.

There is an implementation of XQuery that compiles the xquery template
(in lack of another term) into bytecode, passing thru Kawa which is a
scheme2java compiler.

Look at it here

http://www.gnu.org/software/qexo/

The concept is pretty cool, even if it doesn't touch the issues with
data manipulation.

Another implementation (which also has Cocoon 1.x hooks!) is

http://kweelt.sourceforge.net/

They are both GPLed. and Kweels seems pretty much dead (which might be a
good thing since they might be interested in donating the code to us, if
we ask)

Another one is http://www.fatdog.com/, again GPLed.

All of them are java stuff.

> What data?
> ----------
> For the template language to be useful it should at least give read 
> access to the request and the session object in the object model. It 
> should also give read access to the bean dictionary and the continuation 
> from flowscripts. This basically mean that data from all the get methods 
> in the mentioned objects should be made available. There is no need for 
> write access anymore as all operations with side effects preferably are 
> performed in flowscripts.

This is probably true. I would start with this assumption and then add
things if somebody can prove a real need for it which can't be solved
alternatively.

> I will, somewhat sloppy, refer to all data that we make accessible to 
> XTTL as the OM.

Ok

> For some parts of the above mentioned data it is useful to provide 
> several views. As an example, the StreamGenerator can be asked to parse 
> xml data that is contained in a text field. If the request parameter 
> names are absolute xpaths "/foo/bar[2]" e.g. an xml document can be 
> build from the name/value pairs, something similar is done in XMLForms. 
> The input stream in the request object can also be presented in several 
> different ways depending on its mime type. These examples shows that it 
> can be useful to provide several views of the same data.
> 
> How should the data be represented?
> -----------------------------------
> There are two main choices for how to represent the OM data: as Java 
> object structures or as XML.
> 
> I strongly suggest choosing XML format:
> * Better coupling to XSLT, XPath can be used on all data
> * Better SOC, the XTTL writer should not need to know anything about 
> Java it should be enough to know the XML model of the OM
> * Better protection against misuse, if the Java objects are available 
> within the XSLT processor, the user can make all kinds of side effect 
> operations on the data with extension functions. This is not possible if 
> the data is  supplied as XML.
> 
> Also, as I have discussed in lots of earlier mails, I believe that the 
> current practice to represent data as maps with string values as data, 
> more and more will be replaced by using XML as the universal data format 
> within webapps. This is one of the main reasons for my interest in 
> Stefanos proposal, if we just see XTTL as a mean to access string values 
> in maps in the request object, it will be as any other template 
> language, (although possibly slower). If we instead provide XML adapters 
> for the OM objects, make it easy to access XML that is posted to the 
> webapp, and especially if we provide XML adapters to the data structures 
> that are submitted from the flowscripts, then XTTL will really be far 
> ahead of the main stream template languages.

Logically speaking, I agree that providing a coherent, tree-shaped and
read-only view of data from a template language makes perfect sense.

Still, there are huge performance issues on the table. They are somewhat
related to the push vs. pull debate, which often tend to become
phylosophical questions.

> Ok, given that the OM should be represented as XML, there are still some 
> other design considerations:
> 
> Tree or forest ?
> ----------------
> The XML view of the OM data can either be represented as one large XML 
> tree containing all the XML views of the OM objects as sub trees or as a 
> one XML object for each OM object, (or even parts of the OM objects). I 
> think I prefer the forest view, as it suggests a more modular 
> architecture for the various OM object to XML adapters.

You are stating that there is a difference between a folder and a
document, basically. Logically speaking, I think this is just another
metadata on top of a node and should not be reflected by the underlying
syntax.

This is also the road taken by JSR 170 in designing the Repository API,
which is, in short, a huge tree with granularity down to the single text
node of a DOM or to a single MPEG frame of a multi-Gb video stream.

The problem they have is that they are now so abstract they are not sure
(yet) on how to query it :-) (unless they provide XML-ized views even
for those non-xml nodes, very nasty problem)

> DOM or SAX?
> -----------
> Should the XML adapter provide SAX events or a DOM tree? That really 
> depends. For a Java object data structure a DOM adapter (e.g. 
> domify.sf.net), is probably the best bet. Especially if only a small 
> part of the data structure is accessed. A DOM adapter basically 
> constructs small adapter objects (that implements the appropriate DOM 
> interfaces), when the different parts are accessed. As an implementation 
> note: booth Xalan and XSLTC have internal DOM to DTM adapters (DTM is 
> their internal representation of input XML), so if a DOM document is 
> supplied to the processor it doesn't have to build an internal copy.
> 
> SAX is a definitively a better choice for output from jdbc row sets or 
> parser output and the like. It is probably more efficient than an DOM 
> adapter if all of the XML tree is going to be used of XSLT processor 
> anyway, as no adapter objects need to be constructed. An example of a 
> SAX adapter is Castor.
> 
> Booth SAX and DOM adapters could of course be booth reflection based and 
> special purpose build for your specific Java classes.

The problem is not SAX or dom. It's much worse than this.

Suppose you have an XQuery template with something like

 <html xmlns:c="http://apache.org/cocoon/xquery/object-model">
  <body>
   <form action="$c:om/flow/continuation/id" >
    <input type="text" name="skin"
value="{$c:om/session/style//profile[name='skin']}"/>
   </form>
  </body>
 </html>

from what I read in the XQuery specs, the above is legal. If not, an
alternative could be to use

 <html xmlns:c="http://apache.org/cocoon/xquery/object-model">
  <body>
   <form action="c:om()/flow/continuation/id" >
    <input type="text" name="blah"
value="c:om()/session/style//profile[name='skin']}"/>
   </form>
  </body>
 </html>

Now: how would you implement the above? SAX or DOM?

> Input to the XSLT processor
> ---------------------------
> There are three different mechanisms for supplying external data to an 
> XSLT processor: the input document, the uri resolver and parameters.
> 
> Input document
> - - - - - - - -
> If one uses the input document for supplying the OM it must be in XML 
> form. If it is a DOM tree one can use an 
> javax.xml.transform.dom.DOMSource as input to the transformation and if 
> it is SAX one use a content handler or javax.xml.transform.sax.SAXSource 
> instead. The XSLT processor wrapper from Excalibur that is used in 
> Cocoon only supports SAX so if we choose this path it need to be 
> extended to handle DOM as well if we want to use DOM as input. In this 
> solution the OM must of course be supplied as one large XML document. 
> This would mean that all of the input would have to be either SAX or DOM 
> it wouldn't be possible to choose SAX for some input and DOM for other.
> 
> The document function
> - - - - - - - - - - -
> One can also use the document(uri) function for getting XML data in the 
> XSLT processor. The uri that is used as argument is resolved to a 
> DOMSource or an SAXSource (or a StreamSource) by using a resolver that 
> implements javax.xml.transform.URIResolver that is supplied to the XSLT 
> processor. Also here would be needed some work as the URIResolver that 
> is supplied to the XSLT processor wrapper in the Excalibur 
> implementation, only handles a StreamSource. Using the document function 
> allows booth for a single tree and a forest view of the OM. In the later 
> case there is a need for a uri scheme for accessing the different parts, 
> e.g. "document(request:/param)".
> 
> Parameters
> - - - - - -
> In Xalan and XSLTC, any type of Java object can be supplied to the XSLT 
> processor as a named parameter. The externally supplied parameters must 
> be declared with xsl:param statements on the global level to be used in 
> the rest of the stylesheet, this rules out simplified stylesheets (the 
> ones without enclosing xsl:stylesheet). Types like String, Boolean, 
> Integer, Node, NodeIterator will be adapted to the corresponding XPath 
> types and accessed with ordinary xpath expressions. Types that doesn't 
> correspond to any XPath types can still be used by extension functions, 
> reflection is used to find the right extension function.
> 
> If we supply the object model as a parameter with the name "om", we can 
> then access a request parameter "foo" by writing 
> "java:org.apache.cocoon.environment.Request.getParameter(java:java.util.Map.get($om,

> "request"), "foo")", this requires that the parameter "om" is declared 
> in the stylesheet and that the names pace "java" is defined. There is 
> also a name space mechanism in Xalan that make it possible to use a 
> short name space identifier instead of "package.Class".

I don't get this, can you elaborate more?

> 
> Cashing
> --------

you mean: where you get the money? ;-) Sorry, couln't resist.

> I think there are two main ways to find out what data the cashing of 
> XTTL should be based on:
> One could explicitly list what data the cashing should depend on in e.g. 
> the sitemap or the cashing keys could be inferred from the XTTL 
> document. It might also be a good idea to be able to turn on and of 
> cashing for the XTTL generator as it might be worthwhile to do a fairly 
> complicated validity calculation for a page that is heavy to generate, 
> but not for an page that is cheap to regenerate, (maybe there are some 
> general mechanisms in the sitemap that already does that?).

XSP contains all this already.

As a sidenote I have been thinking that making a non-xml syntax for XSP
might even be better because it was designed for generation and it does
have a bunch of machinery in place already.

> For cashing to be efficient, the cashe validity calculation must be as 
> specific as possible, i.e. the calculation should only take data that 
> the page is dependent on into account. If it depends on a larger set of 
> data it is more costly to calculate the validity and it also mean that a 
> cached page might be considered invalid because some data has changed 
> that it not is dependent on.
> 
> XSLT processing is done in two steps, first a Transformer is created 
> based on the XSLT document, in XSLTC the XSLT document is compiled into 
> byte code during this step, and in Xalan it is compiled into some kind 
> of data structure. We call this the compilation step. Then the 
> transformer can be used each time an XML document shall be processed as 
> described by the XSLT document, we call this the execution step.
> 
> In the TraxGenerator the compilation is performed during the pipeline 
> setup time, and the casche validity object must be based on information 
> that is available during that time. AFAIU cashing can only depend on 
> data structures that are known on sitemap setup time (any cashe gurus 
> reading this that can comment?).

Caching depends on the caching logic implemented by the various modules.
in order for an XSLT stylesheet to be cacheable, it should implement the
proper hooks so that the pipeline can call it.

this sounds rather unfeasible to me.

[snip]

>>suppose you have an XSP-oriented approach: you do
>>
>><request:parameter name="blah"/>
>>
>>which gets translated into something like
>>
>>output.characther(request.getParameter("blah").toChar());
>>
>>
>>now you have something like
>>
>>{request/blah}
>>
>>which is translated into
>>
>><xsl:value-of select="document('cocoon://request/blah')"/>
>>
>>which will:
>>
>>1) take the request object
>>
>>2) saxify it to an xpath engine
>>
>>3) wait for the right even to come
>>
>>4) create a nodeset
>>
>>5) pass it to the stylesheet
>>
>>6) lookup the node value
>>
>>This is potentially *orders of magnitude* slower.
>>
> 
> We wouldn't use the cocoon: protocol but a special purpose om: or maybe 
> request: protocol that would return the actual request object wrapped in 
> a thin (lazy) DOM adapter. 

ok

> This DOM adapted request object will in turn 
> be used internally in Xalan or XSLTC as representation of the data, 
> wrapped in a thin DOM to DTM adapter. This would not be as efficient as 
> the XSP-approach, but it would be far from as bad as suggested above.

hmmmm

> We should also put the efficiency discussion in some context. If we only 
> want to replace the current typical use of XSP with lots of accesses to 
> simple data types like stings, it would probably be hard to beat XSP:s 
> performance with XSLT. But if we are interested to use XML adapted data 
> from the flowscript layer or XML adapted input data in a web service 
> context, XSLT already have all the mechanisms that are needed for 
> handling XML input and it is much better to reuse the software 
> development invested in XSLT processors than to try to extend XSP (or 
> something similar), for this kinds of application.

But what about reversing the picture entirely and build a non-xml syntax
on top of XSP?

> We should most definitely choose a design that can be implemented in a 
> efficient way. But as always we should let actual user needs and 
> evolution guide the actual optimization work.

I agree with this.

> Conclusion
> ----------
> My current opinions are as follows:
> 
> * All input should be in XML-ized, I believe that it allows for a much 
> neater way to write webapps

I do agree there is a potential benefit in this, despite the performance
implications.

> and if we don't I see no reasons for using 
> XSLT as a template language.

granted.

> * Using the input document for supplying OM data is unpractical as it 
> requires all input to be either SAX or DOM, while a mix seem to be much 
> more useful.

agreed

> * If the document function can supply the XTTL generator with cashing 
> info I believe that using the document function for OM input is the most 
> attractive way.

I'm pretty positive this is not fully possible: the document() function
expects sources, which are not cacheable with Cocoon's highly abstract
strategy. They somewhat expect streams and estimate their ergodic period
using last-modification time. cocoon, is much more abstract than this
and it's not always possible to project it onto lastmodifiedtime.

But this is a rather theorical point. Still, those templates will
probably never be cached if they use document() somewhere but this might
not necessarely be a bad thing since most XSP todays don't implement
cacheable anyway and nobody cries for cocoon performance.

> * If not, I don't think it matter that much, booth could use DOM-ifyed 
> data described above. What speaks for the document function is that you 
> don't have to write any global xsl:param, and therefore can use the 
> simplified XSLT format without enclosing xsl:stylesheet. An advantage 
> with parameters is that they can contain node-sets as well as nodes, so 
> that you don't need to use the irritating root node all the time while 
> accessing data.

Good point.

>                                    -- o --
> 
> IMO the large outstanding question is how to handle cashing. For this it 
> might be a good idea to discuss with the Xalan community.

oh, well, I would not expect Xalan to implement a cacheable interface
defined by this project (not counting the circular dependency that would
be introduced!)

> We will need to write possibly cashing aware XML adapters booth if we 
> use the document function or if we use parameters, the only difference 
> is if the XML adapted data is wrapped into a source and made available 
> from the source resolver or if it is given as an argument to the XSLT 
> processor.
> 
> So, let us discuss what data should be supplied to the XTTL generator 
> and let us start to design and implement XML adapters.

Daniel, while I do like this thread, I think there are much more
important issues to deal with right now. So, please, hold on any
high-research RT until we release or my lack of time will force me to
ignore your messages and a bunch of nice thinking could go wasted.

TIA

-- 
Stefano.



Mime
View raw message