hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grandl Robert <rgra...@yahoo.com>
Subject Re: Hadoop - how exactly is a slot defined
Date Wed, 24 Nov 2010 16:53:01 GMT
Hi,
I am sorry bothering again about this subject, but still I am not very convinced what Hadoop
assumes a slot is. I understood it represent smth in terms of CPU/Memory, so you have to allocate
corresponding numbers of map/reduce slots based on your configurations.
BUT, I cannot understand yet, if Hadoop make any mapping between the concept of slot and physical
resources itself, or are just some numbers and you can go over only with this numbers. 
I looked on the code, but I am not able to figure out if Hadoop really did some checking between
number of slots and physical resources, or just is limited by the 2 numbers(for maximum number
of map slots and reduce slots) and play with this numbers only. That means, the user should
give more interpretation of what a slot really may be: (Only one slot per core, one slot per
512 MB, etc) when configure the number of map/reduce slots on his machines.
Thanks in advance for any clue.
Cheers,Robert

--- On Mon, 11/22/10, Harsh J <qwertymaniac@gmail.com> wrote:

From: Harsh J <qwertymaniac@gmail.com>
Subject: Re: Hadoop - how exactly is a slot defined
To: general@hadoop.apache.org
Date: Monday, November 22, 2010, 6:52 PM

Hi,

On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <rgrandl@yahoo.com> wrote:
> Hi all,
>
> I have troubles in understanding what exactly a slot is. Always we are talking about
tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it
represent some allocation of RAM memory as well as with some computation power.
>
> However, can somebody explain me what exactly a slot means (in terms of resources allocated
for a slot) and how this mapping(between slot and physical resources) is done in Hadoop ?
Or give me some hints about the files in the Hadoop  where it may should be ?

A slot is of two types -- Map slot and Reduce slot. A slot represents
an ability to run one of these "Tasks" (map/reduce tasks) individually
at a point of time. Therefore, multiple slots on a TaskTracker means
multiple "Tasks" may execute in parallel.

Right now total slots in a TaskTracker is ==
mapred.tasktracker.map.tasks.maximum for Maps and
mapred.tasktracker.reduce.tasks.maximum for Reduces.

Hadoop is indeed trying to go towards the dynamic slot concept, which
could rely on the current resources available on a system, but work
for this is still in conceptual phases. TaskTrackers emit system
status (like CPU load, utilization, memory available/user, load
averages) in their heartbeats today (and is utilized by certain
schedulers, I think Capacity Scheduler uses it to determine stuff),
but the concept of slots is still fixed as a maximum to the above two
configurations on each TaskTracker.

For code on how slots are checked/utilized, see any Scheduler plugin's
code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for
example.

>
> Thanks a lot,
> Robert
>
>
>



-- 
Harsh J
www.harshj.com



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message