hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandraprakash Bhagtani <cpbhagt...@gmail.com>
Subject Re: multicore node clusters
Date Thu, 10 Sep 2009 13:31:29 GMT

You should definitely change mapred.tasktracker.map/reduce.tasks.maximum. If
your tasks are more CPU bound then you should run the tasks equal to the
number of CPU cores otherwise you can run more tasks than cores. You can
determine CPU and memory usage by running "top" command on datanodes. You
should also take care of following configuration parameters to achieve best

*mapred.compress.map.output:* Faster data transfer (from mapper to
reducers), saves disk space, faster disk writing. Extra time in compression
and decompression

*io.sort.mb: *If you have idle physical memory after running all tasks you
can increase this value. But swap space should not be used since it makes it

**io.sort.factor: *If your map tasks have large number of spills* *then you
should increase this value.It also helps in merging at reducers.

*mapred.job.reuse.jvm.num.tasks: *The overhead of JVM creation for each task
is around 1 second. So for the tasks which live for seconds or a few minutes
and have lengthy initialization, this value can be increased to gain

*mapred.reduce.parallel.copies: *For Large jobs (the jobs in which map
output is very large), value of this property can be increased keeping in
mind that it will increase the total CPU usage.*

**mapred.map/reduce.tasks.speculative.execution: *set to false to gain high

*dfs.block.size* or *mapred.min.split.size* or *mapred.max.split.size* : to
control the number of maps

On Thu, Sep 10, 2009 at 8:06 AM, Mat Kelcey <matthew.kelcey@gmail.com>wrote:

> > I've a cluster where every node is a multicore. From doing internet
> searches I've figured out that I definitely need to change
> mapred.tasktracker.tasks.maximum according to the number of clusters. But
> there are definitely other things that I would like to change for example
> mapred.map.tasks. Can someone point me out the list of things I should
> change to get the best performance out of my cluster ?
> nothing will give you better results than benchmarking with some jobs
> indicative to your domain!

Thanks & Regards,
Chandra Prakash Bhagtani,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message