cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <>
Subject Re: [RT] New Cocoon Site Crawler Environment
Date Wed, 18 Dec 2002 07:40:35 GMT

Berin Loritsch wrote:
> Nicola Ken Barozzi wrote:
>> Vadim Gritsenko wrote:
>>> Nicola Ken Barozzi wrote:
>>> ...
>>>> Why is it so slow?
>>>> Mostly because it generates each source three times.
>> [...]
>>> Note: It gets the page with all the links translated using data 
>>> gathered on previous step.
>> [...]
>>> We can combine getType and getLinks calls into one, see below.
> If it does not scale the way things are now--and I agree generating
> the source three times is two times too many--then we may have to
> change things a bit more deeply.
> For instance, part of the issue resides in the fact that any client
> (i.e. CLI environment or Servlet) can only access one view at a time
> for any resource.
> Why not allow a client to access all views to a resource that it
> needs/wants simultaneously?  That will allow things like the all
> important profiling information to be appended after HTML pages
> are rendered.

I thought of this too. But in practice?...

> Cocoon is so entrenched in the single path of execution mentality that
> environments that need the extra complexity can't have it.
> Each resource should only need to be rendered once, and only
> once.  Each view to the resource should be accessible by a client.
> FOr instance, the CLI client wants the Link/Mime-Type information
> and the content itself.  The Link/Mime-Type information is accessed
> via the LinkSamplingEnvironment.  In reality, that is a poor name
> for what you are really wanting to represent.  It should be the
> LinkSamplingView.  That view caches information that can be incorporated
> back into the list of links we are resolving.

Ok, but in practice, how does the client request the view results?
I kinda like this non-blocking view concept, but fail to see clearly the 
  practical implementation.

> Another issue I have that is related to link crawling, but not to
> the multi-view access.  It is the error page generation.  It is not
> *always* an error if a link is not handled by Cocoon.
> A common example is the fact that JavaDocs are generated outside of
> Cocoon, and the error page that screws up the link to the JavaDocs
> is a *bad* thing.

This is not really a CLI error, but the fact that Cocoon (wrongly IMHO) 
doesn't handle that part of the URI space. We are dealing with this 
concept in Forrest, where we have seen that complete sub-URI spaces can 
be dealt with without link crawling, and so it's feasable to have Cocoon 
serve all those javadocs and not break.

Anyway, there is a way of not making the link be crawled, by setting the 
xlink attribute.

> Perhaps we should allow for known exclusions, or turn off the error
> page generation for the missing links--recording them to a file like
> we do now.

Yup, should be settable +1

Nicola Ken Barozzi         
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)

To unsubscribe, e-mail:
For additional commands, email:

View raw message