hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: Advice on new Datacenter Hadoop Cluster?
Date Thu, 01 Oct 2009 13:32:44 GMT

On Oct 1, 2009, at 7:13 AM, Steve Loughran wrote:

> Ryan Smith wrote:
>> I have a question that i feel i should ask on this thread.  Lets  
>> say you
>> want to build a cluster where you will be doing very little map/ 
>> reduce,
>> storage and replication of data only on hdfs.  What would the  
>> hardware
>> requirements be?  No quad core? less ram?
> Servers with more HDD per CPU, and less RAM. CPUs are a big slice  
> not just of capital, but of your power budget. If you are running a  
> big datacentre, you will care about that electricity bill.
> Assuming you go for 1U with 6 HDD in a 1U box, you could have 6 or  
> 12 TB per U, then perhaps a 2-core or 4-core server with "enough"  
> * with less M/R work, you could allocate most of that TB to work,  
> leave a few hundred GB for OS and logs
> * you'd better estimate external load; if the cluster is storing  
> data then total network bandwidth will be 3X the data ingress (for  
> replication = 3), read costs are that of the data itself. Also, 5  
> threads on 3 different machines handing the write and forward process.
> * I don't know how much load the datanode JVM would take with, say  
> 11 TB of managed storage underneath; that's memory and CPU time.

Datanode load is a function of the number of IOPS.  Basically, buying  
6 12TB nodes versus 3 24TB nodes, you double the number of IOPS per  

If you're using HDFS solely for backup, then the number of IOPS is so  
small you can assume it's zero.  We use HDFS for a non-mapreduce  
physics application, and our particular application mix is such that I  
target 1 batch system core per usable HDFS TB.

> Is anyone out there running big datanodes? What do they see?

Our biggest is 48TB:
* They go offline for 5 minutes during the block reports.  We use rack  
awareness to make sure that both copies are not on big data nodes.   
Fixed in future releases (0.20.0 even, maybe).
* When one disk goes out, the datanode shuts down - meaning that 48  
disks go out.  This is to be fixed in 0.21.0, I think.
* The CPUs (4 cores) are pegged when the system is under full load.   
If I had a chance, I'd give it more CPU horsepower.

As usual, everyone's application is different enough that any anecdote  
is possibly not applicable.


View raw message