forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)
Date Fri, 13 Dec 2002 03:22:06 GMT
Andrew Savory wrote:
> On Thu, 12 Dec 2002, Jeff Turner wrote:
> 
> 
>>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>>the sitemap would be really nice.  The overhead of a <map:read> for every
>>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>>command-line?  Imagine how long it would take for the crawler to grind
>>through _every_ Javadoc page, effectively coping it unmodified from A to
>>B.
> 
> 
> I guess on the plus side, everything is still controlled in one place, and
> since it's on the command line, it can be automated. The downside, as you
> mention, is speed. But is Cocoon significantly slower doing a map:read
> than, say, a "cp" on the command-line? What sort of factor of trade-off
> are we talking about?

A file copy is a native operation. In a modern operating system with a 
modern JVM it can be performed using DMA. So it's lightspeed compared to 
anything that cocoon will be able to do.

But we are talking about 'bulk copy'.

If we talk about scanning for links (and any wget-like crawler, 
CocoonCLI or others, have to do this), then there is no technical reason 
why the Cocoon CLI has to be slower than, say, a wget java clone.

-- 
Stefano Mazzocchi                               <stefano@apache.org>
--------------------------------------------------------------------



Mime
View raw message