Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 253D81152C for ; Wed, 9 Jul 2014 07:48:43 +0000 (UTC) Received: (qmail 64377 invoked by uid 500); 9 Jul 2014 07:48:38 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 64263 invoked by uid 500); 9 Jul 2014 07:48:38 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 64245 invoked by uid 99); 9 Jul 2014 07:48:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jul 2014 07:48:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tomaszguzialek@gmail.com designates 209.85.220.180 as permitted sender) Received: from [209.85.220.180] (HELO mail-vc0-f180.google.com) (209.85.220.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jul 2014 07:48:35 +0000 Received: by mail-vc0-f180.google.com with SMTP id im17so6616026vcb.11 for ; Wed, 09 Jul 2014 00:48:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=VMUFAscClmJxHCPkQvTbpcN6n8NdTy5nHykkzbkZipk=; b=hdbxR6ac72JA8Jthlavgzul708YbJhvVdQCHHnvZdwXvK9ewDjhpPKnitN50Rx/XSO 9BVcAiak2pXeNFRvI0bN+9Bnz3OE1w+YadxNcdq7PFyFaPPfsWwS48VGJ+o7437dMVAY Bf1FibD7cuJA/LYy/Cgel5YRicysNqQ4KG5Zpk+ORIfXDqKsXZVnzlBC+5UrFKLbNUSR PCjwKqtBUmH3MCX/74TR1JL1/qaN1YkToBKzSqFIE4IAgNWV07p/sD4G0wQ8E0LqSLWb BFztA4bP50d4lWWCQJyxODtw/k0ek43Fooc3gaY8/cpUYlLLEAW7yDBycGztkO4Hv6ig LX5g== X-Received: by 10.221.64.80 with SMTP id xh16mr193228vcb.35.1404892091169; Wed, 09 Jul 2014 00:48:11 -0700 (PDT) MIME-Version: 1.0 Sender: tomaszguzialek@gmail.com Received: by 10.58.207.37 with HTTP; Wed, 9 Jul 2014 00:47:41 -0700 (PDT) In-Reply-To: References: From: =?ISO-8859-2?Q?Tomasz_Guzia=B3ek?= Date: Wed, 9 Jul 2014 09:47:41 +0200 X-Google-Sender-Auth: OVSbDdk2J0aP8moghbq_LH9IPOc Message-ID: Subject: Re: The number of simultaneous map tasks is unexpected. To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a113351b20ac0cc04fdbdeeac X-Virus-Checked: Checked by ClamAV on apache.org --001a113351b20ac0cc04fdbdeeac Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thank you for your assistance, Adam. Containers running | Memory used | Memory total | Memory reserved 8 | 8 GB | 9.26 GB | 0 B Seems like you are right: the ApplicationMaster is occupying one slot as I have 8 containers running, but 7 map tasks. Again, I revised my information about m1.large instance on EC2. There are only 2 cores available per node giving 4 computing units (ECU units introduced by Amazon). So 8 slots at a time is expected. However, scheduling AM on a slave node ruins my experiment. I am comparing M/R implementation with a custom one, where one node is dedicated for coordination and I utilize 4 slaves fully for computation. This one core for AM is extending the execution time by a factor of 2. Does any one have an idea how to have 8 map tasks running? Pozdrawiam / Regards / Med venlig hilsen Tomasz Guzia=C5=82ek 2014-07-09 0:56 GMT+02:00 Adam Kawa : > If you run an application (e.g. MapReduce job) on YARN cluster, first the > Application Master will be is started on some slave node to coordinate th= e > execution of all tasks within the job. The ApplicationMaster and tasks th= at > belong to its application run in the containers controlled by the > NodeManagers. > > Maybe, you simply run 8 containers on your YARN cluster and 1 container i= s > consumed by MapReduce AppMaster and 7 containers are consumed by map task= s. > But it seems not to be a root cause of you problem, because according to > your settings you should be able to run 16 containers maximally. > > Another idea might be that your are bottlenecked by the amount of memory > on the cluster (each container consumes memory) and despite having vcore(= s) > available, you can not launch new tasks. When you go to the ResourceManag= er > Web UI, do you see that you utilize whole cluster memory? > > > > 2014-07-08 21:06 GMT+02:00 Tomasz Guzia=C5=82ek : > > I was not precise when describing my cluster. I have 4 slave nodes and a >> separate master node. The master has ResourceManager role (along with >> JobHistory role) and the rest have NodeManager roles. If this really is = an >> ApplicationMaster, is it possible to schedule it on the master node? Thi= s >> single waiting map task is doubling my execution time. >> >> Pozdrawiam / Regards / Med venlig hilsen >> Tomasz Guzia=C5=82ek >> >> >> 2014-07-08 18:42 GMT+02:00 Adam Kawa : >> >> Is not your MapReduce AppMaster occupying one slot? >>> >>> Sent from my iPhone >>> >>> > On 8 jul 2014, at 13:01, Tomasz Guzia=C5=82ek >>> wrote: >>> > >>> > Hello all, >>> > >>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances use= d >>> are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase ta= ble >>> has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run >>> simultaneously. However, only 7 are running and 1 is waiting for an emp= ty >>> slot. Why this surprising number came up? I have checked that the regio= ns >>> are equally distributed on the region servers (2 per node). >>> > >>> > My properties in the job: >>> > Configuration mapReduceConfiguration =3D HBaseConfiguration.create(); >>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4"); >>> > mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", >>> "16"); >>> > >>> > My properties in the CDH: >>> > yarn.scheduler.minimum-allocation-vcores =3D 1 >>> > yarn.scheduler.maximum-allocation-vcores =3D 4 >>> > >>> > Do I miss some property? Please share your experience. >>> > >>> > Best regards >>> > Tomasz >>> >> >> > --001a113351b20ac0cc04fdbdeeac Content-Type: text/html; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable
Thank you for your assistance, Adam.

Containers running | Memory used | Memory total | Memory r= eserved
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0 8 |=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 8 GB |=A0=A0=A0=A0= =A0=A0=A0 9.26 GB |=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0 0 B

Seems like you are right: the ApplicationMaster is occupying one = slot as I have 8 containers running, but 7 map tasks.

Again, = I revised my information about m1.large instance on EC2. There are only 2 c= ores available per node giving 4 computing units (ECU units introduced by A= mazon). So 8 slots at a time is expected. However, scheduling AM on a slave= node ruins my experiment. I am comparing M/R implementation with a custom = one, where one node is dedicated for coordination and I utilize 4 slaves fu= lly for computation. This one core for AM is extending the execution time b= y a factor of 2. Does any one have an idea how to have 8 map tasks running?=

Pozdrawiam / R= egards / Med venlig hilsen
Tomasz Guzia=B3ek


2014-07-09 0:56 GMT+02:00 Adam Kawa <= kawa.adam@gmail.com>:
If you run an application (e.g. MapReduce job) on YAR= N cluster, first the Application Master will be is started on some slave no= de to coordinate the execution of all tasks within the job. The Application= Master and tasks that belong to its application run in the containers contr= olled by the NodeManagers.

Maybe, you simply run 8 containers on your YARN c= luster and 1 container is consumed by MapReduce AppMaster and 7 containers = are consumed by map tasks. But it seems not to be a root cause of you probl= em, because according to your settings you should be able to run 16 contain= ers maximally.

Another idea might be that your are bottlenecked by the= amount of memory on the cluster (each container consumes memory) and despi= te having vcore(s) available, you can not launch new tasks. When you go to = the ResourceManager Web UI, do you see that you utilize whole cluster memor= y?



2014-07-08 21:06 GMT+02:00 Tomasz Guzia=B3ek <= tomasz@guzialek.i= nfo>:

I was not precise when desc= ribing my cluster. I have 4 slave nodes and a separate master node. The mas= ter has ResourceManager role (along with JobHistory role) and the rest have= NodeManager roles. If this really is an ApplicationMaster, is it possible = to schedule it on the master node? This single waiting map task is doubling= my execution time.

Pozdrawiam / Regards / Me= d venlig hilsen
Tomasz Guzia=B3ek


2014-07-08 18:42 GMT+02:00 Adam Kawa <kawa.adam@gmail.com>:

Is not your MapReduce AppMaster occupying one slot?

Sent from my iPhone

> On 8 jul 2014, at 13:01, Tomasz Guzia=B3ek <tomaszguzialek@gmail.com> wro= te:
>
> Hello all,
>
> I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances used= are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase table= has 8 regions, so I expected at least 8 (if not 16) mapper tasks to run si= multaneously. However, only 7 are running and 1 is waiting for an empty slo= t. Why this surprising number came up? I have checked that the regions are = equally distributed on the region servers (2 per node).
>
> My properties in the job:
> Configuration mapReduceConfiguration =3D HBaseConfiguration.create();<= br> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks"= ;, "4");
> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maxim= um", "16");
>
> My properties in the CDH:
> yarn.scheduler.minimum-allocation-vcores =3D 1
> yarn.scheduler.maximum-allocation-vcores =3D 4
>
> Do I miss some property? Please share your experience.
>
> Best regards
> Tomasz



--001a113351b20ac0cc04fdbdeeac--