gump-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Rall <...@finemaltcoding.com>
Subject Re: Let Googlebot crawl our cvs?
Date Thu, 04 Dec 2003 07:48:16 GMT
I seem to recall that use of ViewCVS's "*checkout*"-style URLs is fairly 
expensive on the server-side.  viewcvs.py uses bincvs.rcslib -- which in turn 
uses the rcsfile binary -- to parse the ,v files in the CVS repository itself. 
This operation scales linearly with the size of the ,v file (a function of 
number of changes and size of each delta).  This is likely part of the reason we 
block robots from browsing it with a robots.txt of:

User-agent: *
Disallow: /

Reconstituting the trunk trades a little disk space for better performance and 
scalability.  Reconstituting other branches might be useful as well, but would 
end up using up a lot more disk space.


Martin van den Bemt wrote:
> Hmm think I get too much mail to remember everything :) 
> Sorry :)
> But  maybe the cvsweb url is a nice idea anyway :)
> 
> Mvgr,
> Martin
> 
> On Wed, 2003-12-03 at 15:13, Davanum Srinivas wrote:
> 
>>Martin,
>>
>>Here's my original email.....
>>
>>"I was looking for a replacement for JDK1.4 String.split that would work in 1.3 environment
and
>>found that turbine had one (http://www.google.com/search?q=stringutils++split+site%3Aapache.org)
>>and then i was trying to find where in our cvs if the latest code and took a while
to finally
>>found it in Jakarta commons' lang project. 
>>
>>To cut a long story short....Should we make finding existing code easier by allowing
google's
>>crawler to crawl http://cvs.apache.org/viewcvs/? (currently there is a
>>http://cvs.apache.org/robots.txt that prevents this from happening)."


Mime
View raw message