hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Boucher <bouc...@apple.com>
Subject Re: hardware specs for hadoop nodes
Date Tue, 25 Sep 2007 17:18:57 GMT

On Sep 25, 2007, at 10:09 AM, Michael Bieniosek wrote:

> For our CPU-bound application, I set the value of  
> mapred.tasktracker.tasks.maximum (number of map tasks per  
> tasktracker) equal to the number of CPUs on a tasktracker.   
> Unfortunately, I think this value has to be set per cluster, not  
> per machine.  This is okay for us because our machines have similar  
> hardware, but it might be a problem if your machines have different  
> numbers of CPUs.

I did some experimentation with the number of tasks per machine on a  
set of quad core boxes.  I couldn't figure out how to change this  
value without stopping the cluster and restarting it, and I also  
couldn't figure out how to tune it on a per machine basis (though it  
didn't matter much for me either).

My test had no reduce phase, so I simply set the reduce count to 1  
per machine for all the tests.  On the quad core boxes, 5 map tasks  
per machine actually performed the best, but only marginally better  
than 4 map tasks (about 4% with just one box in the cluster, 2% with  
4 boxes). Six tasks started to trend back in the other direction.

> I created HADOOP-1245 a long time ago for this problem, but I've  
> since heard that hadoop uses only the cluster value for maps per  
> tasktracker, not the hybrid model I describe.  In any case, I never  
> did any work on fixing it because I don't need heterogeneous clusters.
> -Michael
> On 9/25/07 9:37 AM, "Ted Dunning" <tdunning@veoh.com> wrote:
> On 9/25/07 9:27 AM, "Bob Futrelle" <bob.futrelle@gmail.com> wrote:
>> How does Hadoop handle multi-core CPUs?  Does each core run a  
>> distinct copy
>> of the mapped app?  Is this automatic, or need some configuration,  
>> or what?
> Works fine.  You need to tell it how many maps to run per machine.   
> I expect
> that this can be tuned per machine.
>> Or should I just spread Hadoop over some friendly machines already  
>> in my
>> College, buying nothing?
> Or both?  You will get interesting results all three ways.

View raw message