hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Re: About slots of tasktracker and munber of map taskers
Date Mon, 12 Dec 2011 12:03:34 GMT
Hi Tan
        Adding on to Harsh's response.

*Map Reduce Slots*
           It is maximum number of map and reduce tasks that can run
concurrently on your cluster/nodes. Say if you have a 10 node cluster(10
data nodes), each node would be assigned a specific number of map and
reduce tasks it can handle concurrently. It needn't be same for all nodes,
as per the node's hardware capacity it can vary. Considering the
hardware(cpu, memory, ...) of each node the admin assigns these values
accordingly so that the box can handle the resource requirements
gracefully. If you overload these values(assigning more slots), ie you are
asking the box to run more number of simultaneous tasks than it can handle
and it results in memory swap, OOM, CPU cycle unavailability etc and in
turn you end up in having an inefficient cluster encountering large number
of task failures. Here assuming all machines are of same capacity  if one
machine has 8 map and 2 reduce slots then the total number of map task
capacity of your cluster is 8*10=80 maps and 2*10=20 reducers, which means
at a time your cluster can run only 80 map tasks and 20 reduce tasks. So
the total number of map slots is 80 and reduce slots is 20 for your
cluster.

*Map Reduce Tasks*
         It refers to the actual tasks spawn from your map reduce jobs. Say
at a time in my above a cluster I'm firing two jobs, one after other. The
first job spawns 60 mappers and the second one spawns 40 mappers. As soon
as the first job is spawned the 60 slots out of 80 would be occupied, what
is left in cluster is 20 slots. When I trigger my second job it has 40 map
tasks but only 20 slots are available in cluster, so 20 map tasks would be
spawned and the rest 20 has to be in queue, once the slots gets free these
tasks would be able to execute.

          In short the map reduce slots are set by admin based on hardware
on a per node basis. It is not set at individual task level. The developer
need not have to worry on these parameter at his job level. The map reduce
developer can develop his application, based on input splits and Input
Formats it fires maps and reduce tasks. The number of tasks would vary as
per your inputs and jobs. Based on the availability of slots in
cluster(assigned by admin) (and factors like data/rack locality) these
tasks are executed on cluster.

Coming to your question,
As an administrator, I can set the max number of maps/reduces run on a
datanode,
then what I set the number of slots for?

max number of maps/reduces that can run on a datanode at the same time is
exactly what you call map reduce slots specified for that data node.


Hope it is clarifies.

Regards
Bejoy.K.S


2011/12/12 Tan Jun <tanjun_2525@163.com>

> **
> Harsh,
> Sorry for my poor English.
> There is one more question.
> As an administrator, I can set the max number of maps/reduces run on a
> datanode,
> then what I set the number of slots for?
> What's the differences between these attributes?
> In my opinion ,the number of  slot depends on hardware while maps/reduces
> on software.
> Assuming that only one job is running, especially for benchmarking case PI
> computing.
> Thanks!
>
> ------------------------------
> Tan Jun
>
>  *From:* Harsh J <harsh@cloudera.com>
> *Date:* 2011-12-12 13:33
> *To:* mapreduce-user <mapreduce-user@hadoop.apache.org>; tanjun_2525<tanjun_2525@163.com>
> *Subject:* Re: Re: About slots of tasktracker and munber of map taskers
>  Tan,
>
> As an admin, I can even choose to set configuration to even 100 slots
> on a 4-core node, if I feel like burning the box. There is no hardware
> auto-detection, and the slot limit is entirely controlled by the
> mapred-site.xml for that TaskTracker.
>
> The book merely tries to tell that you need to set these maximum slot
> settings based on your hardware knowledge on each node -- TaskTrackers
> do nothing of that sort on their own.
>
> There is some CPU/Memory considerations taken into account by a
> variety of non-default Schedulers in JobTracker, but your slot limits
> per tasktracker is entirely controlled by configuration.
>
> 2011/12/12 Tan Jun <tanjun_2525@163.com>:
> > Hi Harsh,
>
> > Now I know the number of maps and reduces run simultaneously is set by the
> > administrator in mapred-site.xml with default value 2.
> > But I cant get the point about number of slots.
> > For my understanding by now,
> > the number of?slots is decides by hardware that administrator cannot
> > change.
> > Is that wright?
> >
> > ________________________________
> > Tan Jun
> >
> > From: Harsh J
> > Date:?011-12-12?2:22
> > To: mapreduce-user
> > Subject: Re: About slots of tasktracker and munber of map taskers
> > Hi Tan,
> >
> > On 12-Dec-2011, at 8:48 AM, Tan Jun wrote:
> >
> > Hi,
> > I dont really understand the meaning of the sentences in "The Definitive
> > Guide"(page 155):
> >
>
> > Tasktrackers have a fixed number of slots for map tasks and for reduce tasks: for example,
> > a tasktracker may be able to run two map tasks and two reduce tasks simultaneously.
> > (The precise number depends on the number of cores and the amount of
> > memory on the tasktracker; see “Memory” on page?54.)
> >
> > Does that mean the number of slots is fixed and the number of maps run
> > simultaneously is set by user?
> >
> >
>
> > Not by the user, but by the administrator. Each tasktracker is configured in
>
> > production with a 'task slot' upper limit - say, 8 maps and 4 reducers for a
> > 12-core machine. This is not auto-configured (unless you use auto cluster
>
> > setup+configuration tools that determine it for you [0]), and has to be set
> > when configuring Hadoop daemons.
> >
>
> > The book means to imply that you need to set these, based on the memory and
>
> > CPU configuration of your machines. By default, tasktrackers have limits of
> > 2+2.
> >
> > See http://wiki.apache.org/hadoop/LimitingTaskSlotUsage
> >
> > [0] - http://www.cloudera.com/products-services/tools/ is one.
>
>
>
> --
> Harsh J
>

Mime
View raw message