hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <omal...@apache.org>
Subject Re: Terasort problem
Date Sun, 11 Jul 2010 17:39:28 GMT

On Jul 10, 2010, at 4:29 AM, Tonci Buljan wrote:

> mapred.tasktracker.reduce.tasks.maximum <- Is this configured on every
> datanode separately? What number shall I put here?
>
> mapred.tasktracker.map.tasks.maximum <- same question  as
> mapred.tasktracker.reduce.tasks.maximum

Generally, RAM is the scarce resource. Decide how you want to divide  
your worker's RAM between tasks. So with 6 G of RAM,  I'd probably  
make 4 map slots of 0.75G each and 2 reduce slots of 1.5G each.

> mapred.reduce.tasks <- Is this configured ONLY on Namenode and what  
> value
> should it have for my 8 node cluster?

You should set it to your reduce task capacity of 2 * 8 = 16.

> mapred.map.tasks <- same question as mapred.reduce.tasks

It matters less, but go ahead and set it to the map capacity of 4 * 8  
= 32. More important is to set your vm and buffer sizes for the tasks.  
You also want to set your HDFS block size to be 0.5G to 2G. That will  
make your map inputs the right size.

-- Owen




Mime
View raw message