hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From João Paulo Forny <jpfo...@gmail.com>
Subject Re: Map and reduce slots
Date Mon, 14 Apr 2014 17:34:10 GMT
You need to differentiate slots from tasks.
Tasks are spawned by TT and assigned to a free slot in the cluster.
The number of map tasks for a Hadoop job is typically controlled by the
input data size and the split size.
The number of reduce tasks for a Hadoop job is controlled by the
*mapreduce.job.reduces* parameter. If this parameter is not set, jobs have
one reduce task.
The number of map and reduce slots on each TaskTracker node is controlled
by the *mapreduce.tasktracker.map.tasks.maximum*and
*mapreduce.tasktracker.reduce.tasks.maximum* Hadoop properties in the
mapred-site.xml file. These parameters define the maximum number of
concurrently occupied slots on a TaskTracker node and determine the degree
of concurrency on each TaskTracker.
Finally, it is important to consider the memory usage of each map and/or
reduce task. Task heap sizes are usually controlled by the
*mapred.child.java.opts* Hadoop parameter. If your Hadoop jobs are
memory-intensive and have large JVM heaps, then reduce the number of slots.
If your Hadoop jobs have small JVM heaps, you may be able to increase the
number of slots. Keep in mind the maximum amount of memory that the task
JVMs consume if all slots are filled.

2014-04-14 14:18 GMT-03:00 Shashidhar Rao <raoshashidhar123@gmail.com>:

> Hi,
> Can somebody clarify what are map and reduce slots and how Hadoop
> calculates these slots. Are these slots calculated based on the number of
> splits?
> I am getting different answers please help
> Regards
> Shashidhar

View raw message