cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <>
Subject Re: [RT] New Cocoon Site Crawler Environment
Date Wed, 18 Dec 2002 02:09:28 GMT
Nicola Ken Barozzi wrote:
> Vadim Gritsenko wrote:
>> Nicola Ken Barozzi wrote:
>> ...
>>> Why is it so slow?
>>> Mostly because it generates each source three times.
> [...]
>> Note: It gets the page with all the links translated using data 
>> gathered on previous step.
> [...]
>> We can combine getType and getLinks calls into one, see below.

If it does not scale the way things are now--and I agree generating
the source three times is two times too many--then we may have to
change things a bit more deeply.

For instance, part of the issue resides in the fact that any client
(i.e. CLI environment or Servlet) can only access one view at a time
for any resource.

Why not allow a client to access all views to a resource that it
needs/wants simultaneously?  That will allow things like the all
important profiling information to be appended after HTML pages
are rendered.

Cocoon is so entrenched in the single path of execution mentality that
environments that need the extra complexity can't have it.

Each resource should only need to be rendered once, and only
once.  Each view to the resource should be accessible by a client.

FOr instance, the CLI client wants the Link/Mime-Type information
and the content itself.  The Link/Mime-Type information is accessed
via the LinkSamplingEnvironment.  In reality, that is a poor name
for what you are really wanting to represent.  It should be the
LinkSamplingView.  That view caches information that can be incorporated
back into the list of links we are resolving.

Another issue I have that is related to link crawling, but not to
the multi-view access.  It is the error page generation.  It is not
*always* an error if a link is not handled by Cocoon.

A common example is the fact that JavaDocs are generated outside of
Cocoon, and the error page that screws up the link to the JavaDocs
is a *bad* thing.

Perhaps we should allow for known exclusions, or turn off the error
page generation for the missing links--recording them to a file like
we do now.

Just some food for thought.

Introducing NetZero Long Distance
1st month Free!
Sign up today at:

To unsubscribe, e-mail:
For additional commands, email:

View raw message