hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric <eric.x...@gmail.com>
Subject Re: Choosing number of map/reduce slots (with hyperthreading)
Date Mon, 10 Jan 2011 08:06:38 GMT
With hyperthreading, the cpu tries to prevent being idle by running that
extra thread when it has some cycles left. It can do so cheaply, since
hyperthreading is much faster than context switching. So as Arun suggests,
it probably won't hurt as long as you have enough memory in your nodes. Your
cpu will be able to use all it's power and jobs might take a bit longer to
finish, but more jobs will be running at the same time. It will probably be
faster than waisting cpu cycles when processes are waiting for io while your
cpu could be running the other thread. The best way would be to test this.
If you do, please report back to us. I'm very curious about the results!

2011/1/9 Arun C Murthy <acm@yahoo-inc.com>

> Hyperthreading is interesting, but I'd put more emphasis on the amount of
> RAM you have on your boxes.
> The JavaVM allocates all it's heap-size upfront, which means your node will
> starting thrashing on RAM if you put too many tasks per node.
> Arun
> On Jan 6, 2011, at 5:51 PM, Adam Phelps wrote:
>  By scouring various web pages and lists via google I've found some
>> general recommendations when it comes to setting the number of map and
>> reduce slots for a cluster.  It seems to come down to setting them to
>> roughly the number of cores on the machine, minus some if there will be
>> other processes active (such as HBase region servers), and to set the
>> per-task memory usage so that the total will stay below that of the
>> system.  Is this a reasonably general heuristic?
>> One thing I haven't been able to find advice on is whether this
>> heuristic should be adjusted for machines that have hyperthreading
>> enabled.  My thought is that it wouldn't be beneficial to increase the
>> number of slots (especially in a CPU-bound application) as slots equal
>> to the # of cores would already be fully utilizing the CPU.  Are there
>> alternative thoughts regarding that?
>> - Adam

View raw message