Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3EDDA1023F for ; Mon, 23 Sep 2013 18:14:08 +0000 (UTC) Received: (qmail 49468 invoked by uid 500); 23 Sep 2013 18:14:00 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 49177 invoked by uid 500); 23 Sep 2013 18:13:59 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49164 invoked by uid 99); 23 Sep 2013 18:13:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Sep 2013 18:13:58 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [129.6.13.151] (HELO wsget2.nist.gov) (129.6.13.151) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Sep 2013 18:13:49 +0000 Received: from WSXGHUB1.xchange.nist.gov (129.6.18.96) by wsget2.nist.gov (129.6.13.151) with Microsoft SMTP Server (TLS) id 14.3.123.3; Mon, 23 Sep 2013 14:13:11 -0400 Received: from smtp.nist.gov (129.6.16.226) by WSXGHUB1.xchange.nist.gov (129.6.18.96) with Microsoft SMTP Server (TLS) id 8.3.298.1; Mon, 23 Sep 2013 14:13:25 -0400 Received: from p854801.localnet (calais.ncsl.nist.gov [129.6.59.127]) by smtp.nist.gov (8.13.1/8.13.1) with ESMTP id r8NIDPZo001033 for ; Mon, 23 Sep 2013 14:13:25 -0400 From: Antoine Vandecreme To: Subject: Re: How to make hadoop use all nodes? Date: Mon, 23 Sep 2013 14:14:34 -0400 Message-ID: <2367701.KhKEs58rYr@p854801> User-Agent: KMail/4.11.1 (Linux/3.11.1-1-ARCH; KDE/4.11.1; x86_64; ; ) In-Reply-To: References: <5cd8d054fde44601ab5f0a709ac99f6e@BY2PR09MB062.namprd09.prod.outlook.com> <3129260.Jz0lzkYlD7@p854801> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="nextPart65713087.URn37jfMMr" Content-Transfer-Encoding: 7Bit X-Virus-Checked: Checked by ClamAV on apache.org --nextPart65713087.URn37jfMMr Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Hi Omkar, >(which has 40 containers slots.) >> for total cluster? Yes, it was just an hypotetical value though. Below are my real configurations. >1) yarn-site.xml -> what is the resource memory configured for per node? 12288mb >2) yarn-site.xml -> what is the minimum resource allocation for the cluster? 1024mb min 12288mb max I also have those memory configurations in mapred-site.xml : mapreduce.map.memory.mb 5000 mapreduce.map.java.opts -Xmx4g -Djava.awt.headless=true mapreduce.reduce.memory.mb 5000 mapreduce.reduce.java.opts -Xmx4g -Djava.awt.headless=true >3) yarn-resource-manager-log (while starting resource manager "export YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs.. The resulting log is really verbose. Are you searching for something in particular? >4) On RM UI how much total cluster memory is reported (how many total nodes). ( RM UI click on Cluster) So I have 58 active nodes and total memory reported is 696GB which is 58x12 as expected. I have 93 containers running instead of 116 I would expect (my job has 2046 maps so it could use all 116 containers). Here is a copy past of what I have in the scheduler tab: *Queue State: * RUNNING *Used Capacity: * 99.4% *Absolute Capacity: * 100.0% *Absolute Max Capacity: * 100.0% *Used Resources: * *Num Active Applications: * 1 *Num Pending Applications: * 0 *Num Containers: * 139 *Max Applications: * 10000 *Max Applications Per User: * 10000 *Max Active Applications: * 70 *Max Active Applications Per User: * 70 *Configured Capacity: * 100.0% *Configured Max Capacity: * 100.0% *Configured Minimum User Limit Percent: * 100% *Configured User Limit Factor: * 1.0 *Active users: * xxx I don't know where the 139 containers value is comming from. >5) which scheduler you are using? Capacity/Fair/FIFO I did not set yarn.resourcemanager.scheduler.class so apparently the default is Capacity. >6) have you configured any user limits/ queue capacity? (please add details). No. >7) All requests you are making at same priority or with different priorities? (Ideally it will not matter but want to know). I don't set any priority. Thanks for your help. Antoine Vandecreme On Friday, September 20, 2013 12:20:38 PM Omkar Joshi wrote: > Hi, > > few more questions > > (which has 40 containers slots.) >> for total cluster? Please give below > details > > for cluster > 1) yarn-site.xml -> what is the resource memory configured for per node? > 2) yarn-site.xml -> what is the minimum resource allocation for the cluster? > 3) yarn-resource-manager-log (while starting resource manager "export > YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs.. > 4) On RM UI how much total cluster memory is reported (how many total > nodes). ( RM UI click on Cluster) > 5) which scheduler you are using? Capacity/Fair/FIFO > 6) have you configured any user limits/ queue capacity? (please add > details). > 7) All requests you are making at same priority or with different > priorities? (Ideally it will not matter but want to know). > > Please let us know all the above details. Thanks. > > > Thanks, > Omkar Joshi > *Hortonworks Inc.* > > > On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme < > > antoine.vandecreme@nist.gov> wrote: > > Hello Omkar, > > > > Thanks for your reply. > > > > Yes, all 4 points are corrects. --nextPart65713087.URn37jfMMr Content-Transfer-Encoding: 7Bit Content-Type: text/html; charset="us-ascii"

Hi Omkar,

>(which has 40 containers slots.) >> for total cluster?

Yes, it was just an hypotetical value though.

Below are my real configurations.

>1) yarn-site.xml -> what is the resource memory configured for per node?

12288mb

>2) yarn-site.xml -> what is the minimum resource allocation for the cluster?

1024mb min

12288mb max

I also have those memory configurations in mapred-site.xml :

<name>mapreduce.map.memory.mb</name>

</property>

<name>mapreduce.map.java.opts</name>

<value>-Xmx4g -Djava.awt.headless=true</value>

</property>

<name>mapreduce.reduce.memory.mb</name>

</property>

<name>mapreduce.reduce.java.opts</name>

<value>-Xmx4g -Djava.awt.headless=true</value>

</property>

>3) yarn-resource-manager-log (while starting resource manager "export YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..

The resulting log is really verbose. Are you searching for something in particular?

>4) On RM UI how much total cluster memory is reported (how many total nodes). ( RM UI click on Cluster)

So I have 58 active nodes and total memory reported is 696GB which is 58x12 as expected.

I have 93 containers running instead of 116 I would expect (my job has 2046 maps so it could use all 116 containers).

Here is a copy past of what I have in the scheduler tab:

Queue State:	RUNNING
Used Capacity:	99.4%
Absolute Capacity:	100.0%
Absolute Max Capacity:	100.0%
Used Resources:
Num Active Applications:	1
Num Pending Applications:	0
Num Containers:	139
Max Applications:	10000
Max Applications Per User:	10000
Max Active Applications:	70
Max Active Applications Per User:	70
Configured Capacity:	100.0%
Configured Max Capacity:	100.0%
Configured Minimum User Limit Percent:	100%
Configured User Limit Factor:	1.0
Active users:	xxx <Memory: 708608 (100.00%), vCores: 139 (100.00%), Active Apps: 1, Pending Apps: 0>

I don't know where the 139 containers value is comming from.

>5) which scheduler you are using? Capacity/Fair/FIFO

I did not set yarn.resourcemanager.scheduler.class so apparently the default is Capacity.

>6) have you configured any user limits/ queue capacity? (please add details).

No.

>7) All requests you are making at same priority or with different priorities? (Ideally it will not matter but want to know).

I don't set any priority.

Thanks for your help.

Antoine Vandecreme

On Friday, September 20, 2013 12:20:38 PM Omkar Joshi wrote:

> Hi,

> few more questions

> (which has 40 containers slots.) >> for total cluster? Please give below

> details

> for cluster

> 1) yarn-site.xml -> what is the resource memory configured for per node?

> 2) yarn-site.xml -> what is the minimum resource allocation for the cluster?

> 3) yarn-resource-manager-log (while starting resource manager "export

> YARN_ROOT_LOGGER=DEBUG,RFA").. I am looking for debug logs..

> 4) On RM UI how much total cluster memory is reported (how many total

> nodes). ( RM UI click on Cluster)

> 5) which scheduler you are using? Capacity/Fair/FIFO

> 6) have you configured any user limits/ queue capacity? (please add

> details).

> 7) All requests you are making at same priority or with different

> priorities? (Ideally it will not matter but want to know).

> Please let us know all the above details. Thanks.

> Thanks,

> Omkar Joshi

> *Hortonworks Inc.* <http://www.hortonworks.com>

> On Fri, Sep 20, 2013 at 6:55 AM, Antoine Vandecreme <

> antoine.vandecreme@nist.gov> wrote:

> > Hello Omkar,

> >

> > Thanks for your reply.

> >

> > Yes, all 4 points are corrects.

> > However, my application is requesting let say 100 containers on my cluster

> > which has 40 containers slots.

> > So I expected to see all containers slots used but that is not the case.

> >

> > Just in case it matters, it is the only application running on the server.

> >

> > Thanks,

> > Antoine Vandecreme

> >

> > On Thursday, September 19, 2013 04:49:36 PM Omkar Joshi wrote:

> > > Hi,

> > >

> > > Let me clarify few things.

> > > 1) you are making container requests which are not explicitly looking

> > > for

> > > certain nodes. (No white listing).

> > > 2) All nodes are identical in terms of resources (memory/cores) and

> > > every

> > > container requires same amount of resources.

> > > 3) All nodes have capacity to run say 2 containers.

> > > 4) You have 20 nodes.

> > >

> > > Now if an application is running and is requesting 20 containers then

> > > you

> > > can not say that you will get all on different nodes (uniformly

> > > distributed). It more depends on which node heartbeated to the Resource

> > > manager at what time and how much memory is available with it and also

> >

> > how

> >

> > > many applications are present in queue and how much they are requesting

> >

> > at

> >

> > > what request priorities. If it has say sufficient memory to run 2

> > > containers then they will get allocated (This allocation is quite

> > > complex

> > > ..I am assuming very simple "*" reuqest). So you may see few running 2,

> >

> > few

> >

> > > running 1 where as few with 0 containers.

> > >

> > > I hope it clarifies your doubt.

> > >

> > > Thanks,

> > > Omkar Joshi

> > > *Hortonworks Inc.* <http://www.hortonworks.com>

> > >

> > > On Thu, Sep 19, 2013 at 7:19 AM, Vandecreme, Antoine <

> > >

> > > antoine.vandecreme@nist.gov> wrote:

> > > > Hi all,

> > > >

> > > > I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).

> > > > When I am starting a Job, I notice that some nodes are not used or

> > > > partially used.

> > > >

> > > > For example, if my nodes can hold 2 containers, I notice that some

> >

> > nodes

> >

> > > > are not running any or just 1 while others are running 2.

> > > > All my nodes are configured the same way.

> > > >

> > > > Is this an expected behavior (maybe in case others jobs are started) ?

> > > > Is there a configuration to change this behavior?

> > > >

> > > > Thanks,

> > > > Antoine

--nextPart65713087.URn37jfMMr--