hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: hadoop hardware configuration
Date Thu, 28 May 2009 16:15:26 GMT

On May 28, 2009, at 10:32 AM, Ian Soboroff wrote:

> Brian Bockelman <bbockelm@cse.unl.edu> writes:
>
>> Despite my trying, I've never been able to come even close to pegging
>> the CPUs on our NN.
>>
>> I'd recommend going for the fastest dual-cores which are affordable  
>> --
>> latency is king.
>
> Clue?
>
> Surely the latencies in Hadoop that dominate are not cured with faster
> processors, but with more RAM and faster disks?
>
> I've followed your posts for a while, so I know you are very  
> experienced
> with this stuff... help me out here.

Actually, that's more of a gut feeling than informed decision.   
Because the locking is rather coarse-grained, having many CPUs isn't  
going to win anything -- I'd rather any CPU-related portions to go as  
fast as possible.  Under the highest load, I think we've been able to  
get up to 25% CPU utilization: thus, I'm guessing any CPU-related  
improvements will come from faster ones, not more cores.

For my cluster, if I had a lot of money, I'd spend it on a hot-spare  
machine.  Then, I'd spend it on upgrading the RAM, followed by disks,  
followed by CPU.

Then again, for the cluster in the original email, I'd save money on  
the namenode and buy more datanodes.  We've got about 200 nodes and  
probably have a comparable NN.

Brian

Mime
View raw message