cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Fagerstrom <dani...@nada.kth.se>
Subject [RT] Access to the object model in XTTL (was: Re: [RT] the quest for the perfect template language)
Date Thu, 10 Apr 2003 13:46:52 GMT
   In the discussion about XTTL (eXtensible Transformation and Template 
Language, i.e. using XSLT with nicer syntax as template language in 
Cocoon), the access of the Cocoon object model (OM), seem to be the 
major design challenge. So I think it might be a good idea to take a 
closer look at the issues.

We start by discussing what data that should be accessible from XTTL and 
how it could be packaged, and then continue into the more technically 
involved  aquestionsbout how the XSLT processor can access the data in 
an efficient way.


What data?
----------
For the template language to be useful it should at least give read 
access to the request and the session object in the object model. It 
should also give read access to the bean dictionary and the continuation 
from flowscripts. This basically mean that data from all the get methods 
in the mentioned objects should be made available. There is no need for 
write access anymore as all operations with side effects preferably are 
performed in flowscripts.

I will, somewhat sloppy, refer to all data that we make accessible to 
XTTL as the OM.

For some parts of the above mentioned data it is useful to provide 
several views. As an example, the StreamGenerator can be asked to parse 
xml data that is contained in a text field. If the request parameter 
names are absolute xpaths "/foo/bar[2]" e.g. an xml document can be 
build from the name/value pairs, something similar is done in XMLForms. 
The input stream in the request object can also be presented in several 
different ways depending on its mime type. These examples shows that it 
can be useful to provide several views of the same data.

How should the data be represented?
-----------------------------------
There are two main choices for how to represent the OM data: as Java 
object structures or as XML.

I strongly suggest choosing XML format:
* Better coupling to XSLT, XPath can be used on all data
* Better SOC, the XTTL writer should not need to know anything about 
Java it should be enough to know the XML model of the OM
* Better protection against misuse, if the Java objects are available 
within the XSLT processor, the user can make all kinds of side effect 
operations on the data with extension functions. This is not possible if 
the data is  supplied as XML.

Also, as I have discussed in lots of earlier mails, I believe that the 
current practice to represent data as maps with string values as data, 
more and more will be replaced by using XML as the universal data format 
within webapps. This is one of the main reasons for my interest in 
Stefanos proposal, if we just see XTTL as a mean to access string values 
in maps in the request object, it will be as any other template 
language, (although possibly slower). If we instead provide XML adapters 
for the OM objects, make it easy to access XML that is posted to the 
webapp, and especially if we provide XML adapters to the data structures 
that are submitted from the flowscripts, then XTTL will really be far 
ahead of the main stream template languages.

Ok, given that the OM should be represented as XML, there are still some 
other design considerations:

Tree or forest ?
----------------
The XML view of the OM data can either be represented as one large XML 
tree containing all the XML views of the OM objects as sub trees or as a 
one XML object for each OM object, (or even parts of the OM objects). I 
think I prefer the forest view, as it suggests a more modular 
architecture for the various OM object to XML adapters.

DOM or SAX?
-----------
Should the XML adapter provide SAX events or a DOM tree? That really 
depends. For a Java object data structure a DOM adapter (e.g. 
domify.sf.net), is probably the best bet. Especially if only a small 
part of the data structure is accessed. A DOM adapter basically 
constructs small adapter objects (that implements the appropriate DOM 
interfaces), when the different parts are accessed. As an implementation 
note: booth Xalan and XSLTC have internal DOM to DTM adapters (DTM is 
their internal representation of input XML), so if a DOM document is 
supplied to the processor it doesn't have to build an internal copy.

SAX is a definitively a better choice for output from jdbc row sets or 
parser output and the like. It is probably more efficient than an DOM 
adapter if all of the XML tree is going to be used of XSLT processor 
anyway, as no adapter objects need to be constructed. An example of a 
SAX adapter is Castor.

Booth SAX and DOM adapters could of course be booth reflection based and 
special purpose build for your specific Java classes.

Input to the XSLT processor
---------------------------
There are three different mechanisms for supplying external data to an 
XSLT processor: the input document, the uri resolver and parameters.

Input document
- - - - - - - -
If one uses the input document for supplying the OM it must be in XML 
form. If it is a DOM tree one can use an 
javax.xml.transform.dom.DOMSource as input to the transformation and if 
it is SAX one use a content handler or javax.xml.transform.sax.SAXSource 
instead. The XSLT processor wrapper from Excalibur that is used in 
Cocoon only supports SAX so if we choose this path it need to be 
extended to handle DOM as well if we want to use DOM as input. In this 
solution the OM must of course be supplied as one large XML document. 
This would mean that all of the input would have to be either SAX or DOM 
it wouldn't be possible to choose SAX for some input and DOM for other.

The document function
- - - - - - - - - - -
One can also use the document(uri) function for getting XML data in the 
XSLT processor. The uri that is used as argument is resolved to a 
DOMSource or an SAXSource (or a StreamSource) by using a resolver that 
implements javax.xml.transform.URIResolver that is supplied to the XSLT 
processor. Also here would be needed some work as the URIResolver that 
is supplied to the XSLT processor wrapper in the Excalibur 
implementation, only handles a StreamSource. Using the document function 
allows booth for a single tree and a forest view of the OM. In the later 
case there is a need for a uri scheme for accessing the different parts, 
e.g. "document(request:/param)".

Parameters
- - - - - -
In Xalan and XSLTC, any type of Java object can be supplied to the XSLT 
processor as a named parameter. The externally supplied parameters must 
be declared with xsl:param statements on the global level to be used in 
the rest of the stylesheet, this rules out simplified stylesheets (the 
ones without enclosing xsl:stylesheet). Types like String, Boolean, 
Integer, Node, NodeIterator will be adapted to the corresponding XPath 
types and accessed with ordinary xpath expressions. Types that doesn't 
correspond to any XPath types can still be used by extension functions, 
reflection is used to find the right extension function.

If we supply the object model as a parameter with the name "om", we can 
then access a request parameter "foo" by writing 
"java:org.apache.cocoon.environment.Request.getParameter(java:java.util.Map.get($om, 
"request"), "foo")", this requires that the parameter "om" is declared 
in the stylesheet and that the names pace "java" is defined. There is 
also a name space mechanism in Xalan that make it possible to use a 
short name space identifier instead of "package.Class".


Cashing
--------
I think there are two main ways to find out what data the cashing of 
XTTL should be based on:
One could explicitly list what data the cashing should depend on in e.g. 
the sitemap or the cashing keys could be inferred from the XTTL 
document. It might also be a good idea to be able to turn on and of 
cashing for the XTTL generator as it might be worthwhile to do a fairly 
complicated validity calculation for a page that is heavy to generate, 
but not for an page that is cheap to regenerate, (maybe there are some 
general mechanisms in the sitemap that already does that?).

For cashing to be efficient, the cashe validity calculation must be as 
specific as possible, i.e. the calculation should only take data that 
the page is dependent on into account. If it depends on a larger set of 
data it is more costly to calculate the validity and it also mean that a 
cached page might be considered invalid because some data has changed 
that it not is dependent on.

XSLT processing is done in two steps, first a Transformer is created 
based on the XSLT document, in XSLTC the XSLT document is compiled into 
byte code during this step, and in Xalan it is compiled into some kind 
of data structure. We call this the compilation step. Then the 
transformer can be used each time an XML document shall be processed as 
described by the XSLT document, we call this the execution step.

In the TraxGenerator the compilation is performed during the pipeline 
setup time, and the casche validity object must be based on information 
that is available during that time. AFAIU cashing can only depend on 
data structures that are known on sitemap setup time (any cashe gurus 
reading this that can comment?).

Explicit cashing configuration
- - - - - - - - - - - - - - - -
The cashing validity calculation is based on sitemap parameters in the 
current implementation of the TraxTransformer. One can add parameters to 
the XSLT processor and affect the cashe validity calculation by using 
the configuration parameters: use-request-parameters, use-cookies and 
use-session-info. We could use something similar for the XTTL generator.

Inferred cashing configuration
- - - - - - - - - - - - - - -
One could also infer what information the XTTL document is dependent of 
by analyzing it, this could either be done by the XSLT processor or by a 
special preprocessing step performed during sitemap setup.

For the document function with a argument that is known at run time, 
(e.g. an URI string), the XSLT processor could, AFAIU, call the 
URIResolver at compile time, and in that way the XSLT processor from 
Excalibur would be informed that the URI is used by in the stylesheet. 
The XSLT processor wrapper supplies an own implementation of the 
URIResolver interface that register all URI lookups, so that the cashing 
can be dependent on these URI:s. This mechanism is already used for the 
handling of xsl:include and xsl:import, I have not checked if it is used 
by the document function, otherwise we could discuss if it would be 
possible with the Xalan developers.

Say that we use the document function to access the OM and that it 
handles cashing in a nice way. Then if we used a fine grained a URI 
scheme for the OM it would be possible for the XTTL to infer quite 
specific cashing information. E.g. if we wrote 
"document(om:/request/param/foo)" instead of 
"document(om:/)/request/param/foo" the XTTL generator could base its 
cashing on mush less information (the specific request parameter instead 
of all of the OM).

If we supply input to the XSLT processor, the only way to infer what 
data the XSLT stylesheet depends on that I can think of would be to 
write a component that scan the XSLT document during setup time and 
infers it dependencies from the XSLT code. A solution that seem fairly 
complicated to me.


                                  -- o --

Now when we have some more background information, lets return to 
Stefanos worries in some earlier emails:

Stefano Mazzocchi wrote:

>on 4/6/03 11:48 AM danielf@lentus.se wrote:
>
>>In another mail you said:
>> > Two things worry me:
>> >
>> >  1) performance: document() is a push-based function. We can't have
>> >  a call to a cocoon pipeline for each and every variable I need.
>>
>>Variables - request parameters e.g., are read from a source that 
>>basically is a adapts the request object to dom tree or sax events, I 
>>don't see why that would be slow or why the cocoon pipeline should be 
>>called. Or did I miss something?
>>    
>>
>
>suppose you have an XSP-oriented approach: you do
>
> <request:parameter name="blah"/>
>
>which gets translated into something like
>
> output.characther(request.getParameter("blah").toChar());
>
>
>
>
>now you have something like
>
> {request/blah}
>
>which is translated into
>
> <xsl:value-of select="document('cocoon://request/blah')"/>
>
>which will:
>
> 1) take the request object
>
> 2) saxify it to an xpath engine
>
> 3) wait for the right even to come
>
> 4) create a nodeset
>
> 5) pass it to the stylesheet
>
> 6) lookup the node value
>
>This is potentially *orders of magnitude* slower.
>
We wouldn't use the cocoon: protocol but a special purpose om: or maybe 
request: protocol that would return the actual request object wrapped in 
a thin (lazy) DOM adapter. This DOM adapted request object will in turn 
be used internally in Xalan or XSLTC as representation of the data, 
wrapped in a thin DOM to DTM adapter. This would not be as efficient as 
the XSP-approach, but it would be far from as bad as suggested above.

We should also put the efficiency discussion in some context. If we only 
want to replace the current typical use of XSP with lots of accesses to 
simple data types like stings, it would probably be hard to beat XSP:s 
performance with XSLT. But if we are interested to use XML adapted data 
from the flowscript layer or XML adapted input data in a web service 
context, XSLT already have all the mechanisms that are needed for 
handling XML input and it is much better to reuse the software 
development invested in XSLT processors than to try to extend XSP (or 
something similar), for this kinds of application.

We should most definitely choose a design that can be implemented in a 
efficient way. But as always we should let actual user needs and 
evolution guide the actual optimization work.

Conclusion
----------
My current opinions are as follows:

* All input should be in XML-ized, I believe that it allows for a much 
neater way to write webapps and if we don't I see no reasons for using 
XSLT as a template language.

* Using the input document for supplying OM data is unpractical as it 
requires all input to be either SAX or DOM, while a mix seem to be much 
more useful.

* If the document function can supply the XTTL generator with cashing 
info I believe that using the document function for OM input is the most 
attractive way.

* If not, I don't think it matter that much, booth could use DOM-ifyed 
data described above. What speaks for the document function is that you 
don't have to write any global xsl:param, and therefore can use the 
simplified XSLT format without enclosing xsl:stylesheet. An advantage 
with parameters is that they can contain node-sets as well as nodes, so 
that you don't need to use the irritating root node all the time while 
accessing data.

                                   -- o --

IMO the large outstanding question is how to handle cashing. For this it 
might be a good idea to discuss with the Xalan community.

We will need to write possibly cashing aware XML adapters booth if we 
use the document function or if we use parameters, the only difference 
is if the XML adapted data is wrapped into a source and made available 
from the source resolver or if it is given as an argument to the XSLT 
processor.

So, let us discuss what data should be supplied to the XTTL generator 
and let us start to design and implement XML adapters.


/Daniel Fagerstrom



Mime
View raw message