hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dieter De Witte <drdwi...@gmail.com>
Subject Re: Mappers vs. Map tasks
Date Tue, 25 Feb 2014 07:49:34 GMT
Each node has a tasktracker with a number of map slots. A map slot hosts as
mapper. A mapper executes map tasks. If there are more map tasks than slots
obviously there will be multiple rounds of mapping.

The map function is called once for each input record. A block is typically
64MB and can contain a multitude of record, therefore a map task = run the
map() function on all records in the block.

Number of blocks = no. of map tasks (not mappers)

Furthermore you have to make a distinction between the two layers. You have
a layer for computations which consists of a jobtracker and a set of
tasktrackers. The other layer is responsible for storage. The HDFS has a
namenode and a set of datanodes.

In mapreduce the code is executed where the data is. So if a block is in
datanode 1, 2 and 3, then the map task associated with this block will
likely be executed on one of those physical nodes, by tasktracker 1, 2 or
3. But this is not necessary, thing can be rearranged.

Hopefully this gives you a little more insigth.

Regards, Dieter

2014-02-25 7:05 GMT+01:00 Sugandha Naolekar <sugandha.n87@gmail.com>:

> One more thing to ask: No. of blocks = no. of mappers. Thus, those many
> no. of times the map() function will be called right?
> --
> Thanks & Regards,
> Sugandha Naolekar
> On Tue, Feb 25, 2014 at 11:27 AM, Sugandha Naolekar <
> sugandha.n87@gmail.com> wrote:
>> Hello,
>> As per the various articles I went through till date, the File(s) are
>> split in chunks/blocks. On the same note, would like to ask few things:
>>    1. No. of mappers are decided as: Total_File_Size/Max. Block Size.
>>    Thus, if the file is smaller than the block size, only one mapper will be
>>    invoked. Right?
>>    2. If yes, it means, the map() will be called only once. Right? In
>>    this case, if there are two datanodes with a replication factor as 1: only
>>    one datanode(mapper machine) will perform the task. Right?
>>    3. The map() function is called by all the datanodes/slaves right? If
>>    the no. of mappers are more than the no. of slaves, what happens?
>> --
>> Thanks & Regards,
>> Sugandha Naolekar

View raw message