hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grandl Robert <rgra...@yahoo.com>
Subject Re: Hadoop - how exactly is a slot defined
Date Thu, 25 Nov 2010 12:42:50 GMT
Thanks to you all for the explanations.
So, as far as I understand, if I configure 4 map slots per node(let's say - 512 MB RAM per
slot as my node has 2 GB in total) the hadoop will always try to allocate 4 slots ?  Does
the node report on the hearbteat that it has 4 free slots ? 
But then, my question comes: what if another workload contend with hadoop workload at a moment,
that means few resources available now for hadoop. Did hadoop still report he has 4 slots
free and implicitly try to allocate tasks for these 4 slots ?
Thank you again for your promptly answers.
--- On Wed, 11/24/10, Jonathan Creasy <jon.creasy@Announcemedia.com> wrote:

From: Jonathan Creasy <jon.creasy@Announcemedia.com>
Subject: Re: Hadoop - how exactly is a slot defined
To: "general@hadoop.apache.org" <general@hadoop.apache.org>
Date: Wednesday, November 24, 2010, 7:04 PM


Hadoop is not currently doing any dynamic detection of resources to determine the number of
slots. If I told Hadoop it could run 3,587 map tasks, it might well try to do it. 

We use standards to determine how many map and reduce tasks a node is allowed:

Each Map/Reduce Task is given:
2GB of Ram
1 Core
50GB of tmp disk space

The formula for map/reduce slots looks something like this in our environment:

G = GB of Ram
D = Disk space in /tmp
C = count of CPU cores

The minimum of: 

These numbers aren't published anywhere and may completely fly in the face of conventional
wisdom but it's what we are using and so far, seems to work for us. 


On Nov 24, 2010, at 10:53 AM, Grandl Robert wrote:

> Hi,
> I am sorry bothering again about this subject, but still I am not very convinced what
Hadoop assumes a slot is. I understood it represent smth in terms of CPU/Memory, so you have
to allocate corresponding numbers of map/reduce slots based on your configurations.
> BUT, I cannot understand yet, if Hadoop make any mapping between the concept of slot
and physical resources itself, or are just some numbers and you can go over only with this
> I looked on the code, but I am not able to figure out if Hadoop really did some checking
between number of slots and physical resources, or just is limited by the 2 numbers(for maximum
number of map slots and reduce slots) and play with this numbers only. That means, the user
should give more interpretation of what a slot really may be: (Only one slot per core, one
slot per 512 MB, etc) when configure the number of map/reduce slots on his machines.
> Thanks in advance for any clue.
> Cheers,Robert
> --- On Mon, 11/22/10, Harsh J <qwertymaniac@gmail.com> wrote:
> From: Harsh J <qwertymaniac@gmail.com>
> Subject: Re: Hadoop - how exactly is a slot defined
> To: general@hadoop.apache.org
> Date: Monday, November 22, 2010, 6:52 PM
> Hi,
> On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <rgrandl@yahoo.com> wrote:
>> Hi all,
>> I have troubles in understanding what exactly a slot is. Always we are talking about
tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it
represent some allocation of RAM memory as well as with some computation power.
>> However, can somebody explain me what exactly a slot means (in terms of resources
allocated for a slot) and how this mapping(between slot and physical resources) is done in
Hadoop ? Or give me some hints about the files in the Hadoop  where it may should be ?
> A slot is of two types -- Map slot and Reduce slot. A slot represents
> an ability to run one of these "Tasks" (map/reduce tasks) individually
> at a point of time. Therefore, multiple slots on a TaskTracker means
> multiple "Tasks" may execute in parallel.
> Right now total slots in a TaskTracker is ==
> mapred.tasktracker.map.tasks.maximum for Maps and
> mapred.tasktracker.reduce.tasks.maximum for Reduces.
> Hadoop is indeed trying to go towards the dynamic slot concept, which
> could rely on the current resources available on a system, but work
> for this is still in conceptual phases. TaskTrackers emit system
> status (like CPU load, utilization, memory available/user, load
> averages) in their heartbeats today (and is utilized by certain
> schedulers, I think Capacity Scheduler uses it to determine stuff),
> but the concept of slots is still fixed as a maximum to the above two
> configurations on each TaskTracker.
> For code on how slots are checked/utilized, see any Scheduler plugin's
> code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for
> example.
>> Thanks a lot,
>> Robert
> -- 
> Harsh J
> www.harshj.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message