hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grandl Robert <rgra...@yahoo.com>
Subject Re: Hadoop - how exactly is a slot defined
Date Mon, 22 Nov 2010 17:38:22 GMT
Thanks all for your comments.

However, I still have some doubts. 

Basically I can control the number of map/reduce slots with

but, it is possible to set different number of map/reduce slots for different slaves ?

For example If I am running in a heterogeneous environment, where each slave have different
configuration, it is possible to set different number of slots based on the specific machine
configurations ? 
For the moment I observed that I can modify only on the master this parameters, therefore
all the nodes will run with same number of map/reduce slots careless of whatever resources(CPU,MEMORY)
offer each other. 

Thanks for any clue.


--- On Mon, 11/22/10, Harsh J <qwertymaniac@gmail.com> wrote:

From: Harsh J <qwertymaniac@gmail.com>
Subject: Re: Hadoop - how exactly is a slot defined
To: general@hadoop.apache.org
Date: Monday, November 22, 2010, 6:52 PM


On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <rgrandl@yahoo.com> wrote:
> Hi all,
> I have troubles in understanding what exactly a slot is. Always we are talking about
tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it
represent some allocation of RAM memory as well as with some computation power.
> However, can somebody explain me what exactly a slot means (in terms of resources allocated
for a slot) and how this mapping(between slot and physical resources) is done in Hadoop ?
Or give me some hints about the files in the Hadoop  where it may should be ?

A slot is of two types -- Map slot and Reduce slot. A slot represents
an ability to run one of these "Tasks" (map/reduce tasks) individually
at a point of time. Therefore, multiple slots on a TaskTracker means
multiple "Tasks" may execute in parallel.

Right now total slots in a TaskTracker is ==
mapred.tasktracker.map.tasks.maximum for Maps and
mapred.tasktracker.reduce.tasks.maximum for Reduces.

Hadoop is indeed trying to go towards the dynamic slot concept, which
could rely on the current resources available on a system, but work
for this is still in conceptual phases. TaskTrackers emit system
status (like CPU load, utilization, memory available/user, load
averages) in their heartbeats today (and is utilized by certain
schedulers, I think Capacity Scheduler uses it to determine stuff),
but the concept of slots is still fixed as a maximum to the above two
configurations on each TaskTracker.

For code on how slots are checked/utilized, see any Scheduler plugin's
code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for

> Thanks a lot,
> Robert

Harsh J

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message