hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: HDFS: Good practices for Number of Blocks per Datanode
Date Fri, 02 May 2008 17:44:26 GMT
Cagdas Gerede wrote:
> For a system with 60 million blocks, we can have 3 datanodes with 20 million
> blocks each, or we can have 60 datanodes with 1 million blocks each. In
> either case, would there be performance implications or would they behave
> the same way?

If you're using mapreduce, then you want your computations to run on 
nodes where the data is local.  The most cost-effective way to buy CPUs 
is generally in 2-8 core boxes that hold 2-4 hard drives, and this also 
generally gives good i/o performance.  In theory, boxes with 64 CPUs and 
64 drives each will perform similarly to 16 times as many boxes, each 
with 4 CPUs and 4 drives, but the former is both more expensive, and, 
when a box fails, you take a bigger hit.  Also, with more boxes, you 
generally get more network interfaces and hence more aggregate 
bandwidth, assuming you have a good switch.

Doug

Mime
View raw message