hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomasz Guziałek <tom...@guzialek.info>
Subject Re: The number of simultaneous map tasks is unexpected.
Date Wed, 09 Jul 2014 07:47:41 GMT
Thank you for your assistance, Adam.

Containers running | Memory used | Memory total | Memory reserved
                         8 |             8 GB |        9.26 GB
|                     0 B

Seems like you are right: the ApplicationMaster is occupying one slot as I
have 8 containers running, but 7 map tasks.

Again, I revised my information about m1.large instance on EC2. There are
only 2 cores available per node giving 4 computing units (ECU units
introduced by Amazon). So 8 slots at a time is expected. However,
scheduling AM on a slave node ruins my experiment. I am comparing M/R
implementation with a custom one, where one node is dedicated for
coordination and I utilize 4 slaves fully for computation. This one core
for AM is extending the execution time by a factor of 2. Does any one have
an idea how to have 8 map tasks running?

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-09 0:56 GMT+02:00 Adam Kawa <kawa.adam@gmail.com>:

> If you run an application (e.g. MapReduce job) on YARN cluster, first the
> Application Master will be is started on some slave node to coordinate the
> execution of all tasks within the job. The ApplicationMaster and tasks that
> belong to its application run in the containers controlled by the
> NodeManagers.
>
> Maybe, you simply run 8 containers on your YARN cluster and 1 container is
> consumed by MapReduce AppMaster and 7 containers are consumed by map tasks.
> But it seems not to be a root cause of you problem, because according to
> your settings you should be able to run 16 containers maximally.
>
> Another idea might be that your are bottlenecked by the amount of memory
> on the cluster (each container consumes memory) and despite having vcore(s)
> available, you can not launch new tasks. When you go to the ResourceManager
> Web UI, do you see that you utilize whole cluster memory?
>
>
>
> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <tomasz@guzialek.info>:
>
> I was not precise when describing my cluster. I have 4 slave nodes and a
>> separate master node. The master has ResourceManager role (along with
>> JobHistory role) and the rest have NodeManager roles. If this really is an
>> ApplicationMaster, is it possible to schedule it on the master node? This
>> single waiting map task is doubling my execution time.
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz Guziałek
>>
>>
>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <kawa.adam@gmail.com>:
>>
>> Is not your MapReduce AppMaster occupying one slot?
>>>
>>> Sent from my iPhone
>>>
>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <tomaszguzialek@gmail.com>
>>> wrote:
>>> >
>>> > Hello all,
>>> >
>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used
>>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table
>>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run
>>> simultaneously. However, only 7 are running and 1 is waiting for an empty
>>> slot. Why this surprising number came up? I have checked that the regions
>>> are equally distributed on the region servers (2 per node).
>>> >
>>> > My properties in the job:
>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum",
>>> "16");
>>> >
>>> > My properties in the CDH:
>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>> >
>>> > Do I miss some property? Please share your experience.
>>> >
>>> > Best regards
>>> > Tomasz
>>>
>>
>>
>

Mime
View raw message