hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Hadoop Performance Settings
Date Sat, 01 Apr 2006 18:27:43 GMT
Apache's web site is almost entirely hosted on a single machine, so 
polite crawling must be single threaded.  Multiple machines will not 
make this sort of a crawl much faster.  Try crawling a diverse set of 
hosts, e.g., by seeding with DMOZ.


Dennis Kubes wrote:
> I am finding that it is taking me longer to do the same crawl (just
> apache.org) on the DFS across 6 machines then it does on 1 local filesystem.
> Where should I look for Hadoop performance settings, etc?  I am just looking
> for some direction.
> Dennis

View raw message