cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira">
Subject Re: CLI question: how are links retrieved?
Date Sat, 17 May 2003 21:45:34 GMT

> I gather that some speedups were made by not requesting each page 3
> times (for content, links, and something else..).  Is the 'links' view
> as defined in the sitemap still used when crawling pages?

Just to note - you can still use the old method with the CLI (i.e. requesting each page 
3 times), the option is still there to do it just the same as it was. In fact, I believe that

is the default behaviour.
> It would seem not, because I can completely delete the <map:view
> name="links"..> section, or corrupt its transformer @src, and the CLI
> still retrieves links from pages.

In the new behaviour, it does not use the links view, it uses a 'LinkGatherer' which 
collects the links and stores them in the ObjectModel for later use by the CLI. This is 
done in the org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode 
class, at the same point as the old-style CLI rewrites its links.
> The problem I'm having with Forrest is that:
> 1) site: and ext: links need to be rewritten by a transformer before
> the
>    CLI can follow them.  They are not rewritten with the new CLI,
>    causing broken links.

The links are gathered right at the end of the pipeline, just before the serializer, I 
believe. So they should have been translated by then, Is that not happening? Can you 
explain a little more what is supposed to be happening?

> 2) the filterlinks.xsl stylesheet, used only in the 'links' view
>    pipeline, is required to filter out unwanted links, and this isn't
>    being called.

I'm thinking of adding 'exclude' and 'include' options to the cli.xconf, so that you can 
exclude as necessary. Would that address your needs?

> Perhaps as a result of 1), I get lots of these stacktraces:
> java.lang.NullPointerException
>         at
>         org.apache.cocoon.environment.AbstractEnvironment.release(Abst
> at

Bleurgh. Don't know where to start on that one. But lets look at the above and see if 
that helps.

I'll probably not be online until Monday, but I'll happily carry on a discussion then.

Regards, Upayavira

View raw message