forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Ken Barozzi <>
Subject Re: file: implemented (Re: cvs commit: ...)
Date Fri, 13 Dec 2002 09:58:53 GMT

Andrew Savory wrote:
> On Fri, 13 Dec 2002, Jeff Turner wrote:
>>Because in the long run,  I would prefer to develop a separate wget-like
>>tool with cocoon-view hacks added to it, than to develop the CLI into a
>>full-blown threaded crawler.  Why?  Because a separate tool has a _much_
>>larger audience, so will evolve faster.  Yes, a Cocoon CLI may be more
>>elegant, but a separate tool can grow geometrically while the CLI grows
> I can see some serious advantages to splitting the crawler from the CLI:
> when the crawler is there, it would be fantastic to add a "precacher"
> using the crawler (go hit my entire site, including internal cocoon-views)
> rather than the "traditional" approach of running wget on a site. I
> suspect various other things that rely on crawling (such as search
> implementations like the Lucene code) would benefit from the speed
> increase of a dedicated crawler, too.
> I think it would be best done as part of Cocoon rather than Forrest though
> (or am I missing the point *again*? ;-), as there are more ways it would
> be used there.

In Cocoon CVS, there is a scratchpad effort to decouple the crawling 
from the CLI, and an Ant task that also can use that crawler.

So yes, the crawler will most probably be indipendent from the CLI.

Nicola Ken Barozzi         
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)

View raw message