cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: CLI question: how are links retrieved?
Date Sun, 18 May 2003 09:47:46 GMT
On Sat, May 17, 2003 at 10:45:34PM +0100, Upayavira wrote:
> Jeff,
> 
> > I gather that some speedups were made by not requesting each page 3
> > times (for content, links, and something else..).  Is the 'links' view
> > as defined in the sitemap still used when crawling pages?
> 
> Just to note - you can still use the old method with the CLI (i.e.
> requesting each page 3 times), the option is still there to do it just
> the same as it was. In fact, I believe that is the default behaviour.

Thanks, didn't know that.  I've gotten Forrest working this way now.

> > It would seem not, because I can completely delete the <map:view
> > name="links"..> section, or corrupt its transformer @src, and the CLI
> > still retrieves links from pages.
> 
> In the new behaviour, it does not use the links view, it uses a
> 'LinkGatherer' which collects the links and stores them in the
> ObjectModel for later use by the CLI. This is done in the
> org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode class,
> at the same point as the old-style CLI rewrites its links.
>  
> > The problem I'm having with Forrest is that:
> > 
> > 1) site: and ext: links need to be rewritten by a transformer before
> > the
> >    CLI can follow them.  They are not rewritten with the new CLI,
> >    causing broken links.
> 
> The links are gathered right at the end of the pipeline, just before
> the serializer, I believe.

So say I use multiple pipelines to generate a response:

<map:match pattern="index.html">
  <map:aggregate>
    <map:part src="cocoon:/body-index.xml"/>
    ...
  <map:serialize type="html"/>
</map:match>

<map:match pattern="body-index.xml">
  <map:generate src="cocoon:/index.xml"/>
  <map:transform src="linkrewriter" .../>
  ...
  <map:serialize type="xml"/>
</map:match>

<map:match pattern="index.xml">
  <map:generate src="content/xdocs/index.xml"/>
  ...
  <map:serialize type="xml"/>
</map:generate>


Which pipeline would the links be sampled from?  I'm speculating that the
first serializer to be invoked (index.xml) writes the links to the object
model.  This would explain what I'm seeing, as links are only rewritten
in the second pipeline.

> So they should have been translated by then, Is that not happening? Can
> you explain a little more what is supposed to be happening?

As shown above, there is a three-layer sitemap.  The 'linkrewriter'
transformer converts tags like <a href="site:index"> to <a
href="../index.html">.  They're not being converted when using the
'cli.xconf' CLI.

> > 2) the filterlinks.xsl stylesheet, used only in the 'links' view
> >    pipeline, is required to filter out unwanted links, and this isn't
> >    being called.
> 
> I'm thinking of adding 'exclude' and 'include' options to the
> cli.xconf, so that you can exclude as necessary. Would that address
> your needs?

It would.  I quite like the 'links view' system though.  Doing link
manipulation in a Cocoon pipeline seems generally more flexible than
adding cli.xconf parameters, and is much easier to debug with
?cocoon-view=links

> > Perhaps as a result of 1), I get lots of these stacktraces:
> > 
> > java.lang.NullPointerException
> >         at
> >         org.apache.cocoon.environment.AbstractEnvironment.release(Abst
> >         ractEnvironment.java:511) at
> 
> Bleurgh. Don't know where to start on that one. But lets look at the
> above and see if that helps.

Further digging shows it happening when a <map:redirect-to> is
encountered.  Attached is a small diff demonstrating this for 'build
docs' in Cocoon.  I'll file a bugreport to keep track of this.

> I'll probably not be online until Monday, but I'll happily carry on a
> discussion then.

Thanks for your help!

--Jeff

> Regards, Upayavira
> 

Mime
View raw message