cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <nicola...@apache.org>
Subject Re: [RT] Fixing the CLI
Date Mon, 24 Feb 2003 17:57:31 GMT

Upayavira wrote, On 24/02/2003 18.24:
>>>This will be possible soon. Functionality is there, command line
>>>option is missing ATM. If you need it urgent, you can fix it in 10
>>>minutes.
>>
>>Yes, it's on my TODO list, along with other CLI optimizations I 
>>discussed with Vadim :-)
> 
> 
> Seeing as this was my fault (I didn't add an option to Main.java when I split Main 
> into Main and CocoonBean, patch applied by Vadim), I have done it now. 

Not your fault. It's an additional feature :-D

> There's 
> now an option (-e), which allows you to switch off confirmation of extensions (-e 
> false will switch it off, default is true to maintain existing functionality).
> 
> I've also added an option to pre-load a class, so that the CLI can be used to 
> generate database driven sites (-L <classname>). It can be repeated to allow the

> loading of more than one class.
> 
> As soon as I can get my Cocoon system to work (see invalid config message), I'll 
> test and post a patch to Bugzilla.

Excellent! :-D

> I'd be interested to hear about these CLI optimizations you refer to.

Well... hmmm... ok, let me see if I remember it well enough.

There are two sets of optimizations possible: traversing optimizations 
and sitemap short-circuits.

Traversing optimizations
-------------------------

As you know, the Cocoon CLI gets the content of a page 3 times.
I had refactored these three calls to Cocoon in the methods (in call order):

  1 - getLinks
       First the page is generated and the link view is used to
       get the links

  2 - getType (called in translateURI)
       Then the type of the page is needed, so we know if we need
       to add an extension or other things; basically to translate
       the URI

  3 - getPage
      Actually gets the page *and* uses the translated URIS in the links

Now, with the -e option we basically don't need step 2. If done 
correctly, this will increase the speed! :-)

So we have two steps left: getting links and getting the page.
If we can make them into a single step we're done.

Cocoon has the concept of pluggable pipelines. And each pipeline is 
responsible of connecting the various components. If we used a pipeline 
that simply inserts between the source and the next components a pipe 
that records all links 
(org.apache.cocoon.xml.xlink.ExtendedXLinkPipe.java) into the 
Enviroment, we can effectively get both the result and the links in a 
single pass.

NOTE: This is possible *only* if we use the -e option. If we don't, the 
URL translation needed makes it impossible to do it in a single step, 
unless we keep the documents in memory and use a recursive algorithm, 
which poses bigger problems of scalability.


Sitemap short-circuits
-------------------------

Sometimes in the sitemap you will find things like:

     <map:match pattern="*/**">
       <map:read mime-type="text/html" src="docs/{1}/{2}.html"/>
     </map:match>

In this case the CLI fails to copy all the html files that the webapp 
version does.

We *could* pass it in the pipeline and traverse the links, but if we 
didn't want to touch the html at all? Imagine also that those html files 
are 5MB of Javadocs... ;-)

So in this case the CLI could see that we have a match with a reader on 
the local filesystem, and locally "invert" the pipeline with an 
optimization. That is, copy all html files under the docs dir, which in 
Java can be done orders of magnitude faster than under Cocoon.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Mime
View raw message