cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira">
Subject Re: cli.xconf questions
Date Mon, 04 Aug 2003 20:06:25 GMT
On Mon, 4 Aug 2003 22:38:55 +1000, "Jeff Turner" <> said:
> On Mon, Aug 04, 2003 at 08:25:01AM +0000, Upayavira wrote:
> > On Sat, 2 Aug 2003 22:08:21 +1000, "Jeff Turner" <> said:
> > > Hi,
> > > 
> > > I'm tinkering around with the CLI, thinking how to add
> > > don't-crawl-this-page support, and have some questions on how cli.xconf
> > > currently works.  The following block in cli.xconf has me confused..
> > 
> > Jeff. Great to see you're engaging with it!
> It doubled Forrest's speed - I love it ;)

Great. And there's more we can do.

> > I have also been working on the CLI. I've spent my week's spare time
> > completely reworking it. I'll post separately about what I've been up to,
> > but basically the whole thing should be much easier to understand, with a
> > separate crawler class, a separate class for handling Cocoon
> > initialisation, and another for handling URI arithmetic (which you're
> > talking about below). As to adding exclusions, I think it should merely
> > be a question of identifying the syntax. The rest, with my new code,
> > should be pretty easy (e.g. tell the crawler what to ignore with a set of
> > wildcard parameters).
> Sounds marvellous.

I've started debugging now. I'll aim to commit later this week.

> > When I've got this going, I'm going to convert the xconf code to use a
> > Configuration object, and then write an Ant task to do the same
> > ProcessXConf, so that you can have the xconf code directly in your Ant
> > script. This Ant task will be a simple wrapper around the bean, and
> > should be pretty trivial.
> Mmm.. nice.  Might be some ideas to steal from Ant here, notably the idea
> of PatternSets and Mappers.

Yup. I'm keen to see what we can steal. Unfortunately, we'll have to code
it twice - it doesn't seem to be possible to share code between ant and

> > I have also, I think, just sorted my problem with my caching code not
> > working. Basically, the Cocoon cache is transient. So therefore it is
> > lost every time Cocoon starts. And Cocoon is started every time the CLI
> > starts. So if we want to have the CLI only generate new pages based upon
> > the cache, we've got to make the cache for the CLI persistent. Again, see
> > separate thread.
> This would be really awesome :)  Lots of people have asked if Forrest
> could only regenerate pages that have changed.  I'll defer further
> thoughts till the other thread.

Thread will come when I've got the basic code working.
> ...
> > > Come to think of it, the attribute name 'src'
> > > doesn't really make sense.  What is the "source" of a Cocoon URI?  It
> > > would be the XML (documents/index.xml), which is not what we're
> > > specifying in @src.
> > 
> > It is the source for a source/destination pair. You could see it as a
> > cocoon: protocol source (almost). Would you suggest something different?
> No, makes sense given that explanation.


> > > I have the feeling that cli.xconf's job, mapping URIs to the filesystem,
> > > could potentially be quite intricate.  It is roughly an inverse of what
> > > the sitemap does.  Perhaps we need an analogous syntax?
> > 
> > Perhaps. I think we've only just started trying to work out what is
> > possible here. I'd be pleased to carry on the conversation, as what we
> > have at the moment is purely what I thought best, and not the result of
> > much community discussion.
> >
> > There's alot we could discuss here. For example, how do we handle the
> > situation where we want to crawl a number of pages, but don't want to
> > have to repeat the destination for each of them? I think we could come up
> > with an elegant configuration for this. My <uri> thing is only the
> > beginning. 
> There is ${variable} interpolation code in Avalon, if that helps.  Eg.
> ${context-root} in logkit.xconf.

I'll look into that.
> > The first thing to do is to start identifying the possible use cases for
> > URI mappings, so that we can see the range of the problem we're trying to
> > solve (and take it beyond the scope of just fixing my problems only!).
> Well, two observations:
> 1) Hosting a live Cocoon site is a PITA:
>  - One has to fight with sysadmins to install JVMs.  Many site hosts
>    (like SF) don't even offer Java-based services.
>  - JVMs permanently chew up vast amounts of memory
>  - Servlet containers hang, crash, throw OutOfMemoryExceptions and are
>    generally unreliable.
>  - Cocoon is not particularly fast
> 2) A surprising number of sites **don't need to be dynamic**
> So in walks our hero, the CLI.  We can get most of the magic of Cocoon,
> with none of the pain.  Develop a site with a live Cocoon, and when
> you're ready to deploy, serialize it to disk and serve through Apache.
> That's why I think the CLI is very important.  More than *anything* else,
> it has the potential to vastly widen Cocoon's audience.
> So from this perspective, the need is simple.  We need the CLI to provide
> as accurate a representation of the live site as possible.  Generally
> this means simply mirroring the URI structure to disk.
> Currently, the biggest unmet need is the ability to exclude certain URLs.
> There is usually non-Cocoon-generated content like Javadocs, or other
> parts of the site, which needs to be excluded.

Well, lets get that working well.

Are you willing to test my new version when its ready?

Regards, Upayavira

View raw message