hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: Ubuntu open file limits
Date Fri, 02 Oct 2015 23:14:04 GMT
OK, now it is about to get really interesting. It turns out that the nodes of a cluster are
not configured symmetrically. 

If I run the command to run multiple instances of "ulimit -a" vis YARN, to get them spread
around the cluster nodes: 
dsjar=/usr/hdp/2.2.8.0-3150/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar
hadoop jar $dsjar org.apache.hadoop.yarn.applications.distributedshell.Client --jar $dsjar
--shell_command 'uimit -a' --num_containers 9 

yarn logs -applicationId application_1443767835805_0009 > /tmp/foo 

egrep 'Container:|open files' /tmp/foo 
Container: container_e03_1443457398740_0223_01_000009 on rpb-ubn-hdin-1.office.datalever.com_45454

open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000006 on rpb-ubn-hdin-1.office.datalever.com_45454

open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000003 on rpb-ubn-hdin-1.office.datalever.com_45454

open files (-n) 32768 
Container: container_e03_1443457398740_0223_01_000007 on rpb-ubn-hdin-2.office.datalever.com_45454

open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000010 on rpb-ubn-hdin-2.office.datalever.com_45454

open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000004 on rpb-ubn-hdin-2.office.datalever.com_45454

open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000008 on rpb-ubn-hdin-3.office.datalever.com_45454

open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000005 on rpb-ubn-hdin-3.office.datalever.com_45454

open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000002 on rpb-ubn-hdin-3.office.datalever.com_45454

open files (-n) 4096 
Container: container_e03_1443457398740_0223_01_000001 on rpb-ubn-hdin-3.office.datalever.com_45454

Only the first worker node has the higher file limit.  The rest have lower limits.

I have verified this on two separate clusters now.  The same discrepencies are observed by
looking at /proc/<PID>/limits for the datanode processes on each worker node.

This is looking like an Ambari issue perhaps?  

John Lilley

-----Original Message-----
From: John Lilley 
Sent: Thursday, October 1, 2015 10:22 AM
To: Varun Vasudev <vvasudev@apache.org>
Subject: RE: Ubuntu open file limits

That's the frustrating thing.  Apparently on Ubuntu (maybe just 12.04?), services do not get
their limits from /etc/security/limits.conf.  We put these entries in long ago but they have
no affect:

* hard nofile 65536
* soft nofile 65536
root hard nofile 65536
root soft nofile 65536

John Lilley


-----Original Message-----
From: Varun Vasudev [mailto:vvasudev@apache.org]
Sent: Thursday, October 01, 2015 10:06 AM
To: John Lilley <john.lilley@redpoint.net>
Subject: Re: Ubuntu open file limits

Ok. I’m not sure why ambari-agent has such low limits. Did you reboot the machine after
changing the limits in the limits.conf?

-Varun



On 10/1/15, 9:32 PM, "John Lilley" <john.lilley@redpoint.net> wrote:

>12.04 LTS
>
>BTW it appears that ambary-agent has a hard nofiles limit of 4096:
>
>$ sudo service ambari-agent status
>Found ambari-agent PID: 1463
>ambari-agent running.
>
>$ cat /proc/1463/limits
>Limit                     Soft Limit           Hard Limit           Units
>Max cpu time              unlimited            unlimited            seconds
>Max file size             unlimited            unlimited            bytes
>Max data size             unlimited            unlimited            bytes
>Max stack size            8388608              unlimited            bytes
>Max core file size        0                    unlimited            bytes
>Max resident set          unlimited            unlimited            bytes
>Max processes             95970                95970                processes
>Max open files            1024                 4096                 files
>Max locked memory         65536                65536                bytes
>Max address space         unlimited            unlimited            bytes
>Max file locks            unlimited            unlimited            locks
>Max pending signals       95970                95970                signals
>Max msgqueue size         819200               819200               bytes
>Max nice priority         0                    0
>Max realtime priority     0                    0
>Max realtime timeout      unlimited            unlimited            us
>
>John Lilley
>
>
>-----Original Message-----
>From: Varun Vasudev [mailto:vvasudev@apache.org]
>Sent: Thursday, October 01, 2015 9:59 AM
>To: John Lilley <john.lilley@redpoint.net>
>Subject: Re: Ubuntu open file limits
>
>Hi John,
>
>Which version of HDP are you running?
>
>-Varun
>
>
>
>
>
>On 10/1/15, 9:26 PM, "John Lilley" <john.lilley@redpoint.net> wrote:
>
>>Thanks for the suggestion, but no files in that folder contain "nofile" .
>>
>>This is the contents of that folder:
>>-rwx------ 1 root root  1052 Apr 13 12:26 ambari-env.sh -rwxr-xr-x 1 
>>root root  1365 Apr 13 12:26 ambari-python-wrap -rwxr-xr-x 1 root root
>>1361 Apr 13 12:26 ambari-sudo.sh drwxr-xr-x 8 root root  4096 Sep 16
>>12:51 cache drwxr-xr-x 3 root root 36864 Oct  1 09:54 data
>>-rwx------ 1 root root  3114 Apr 13 12:26 install-helper.sh drwxr-xr-x
>>2 root root  4096 Apr 13 12:26 keys
>>
>>Is one of these files a candidate for placing a "ulimit -n" command to raise the limit?
>>
>>Thanks,
>>John Lilley
>>
>>
>>-----Original Message-----
>>From: Varun Vasudev [mailto:vvasudev@apache.org]
>>Sent: Thursday, October 01, 2015 9:49 AM
>>To: John Lilley <john.lilley@redpoint.net>
>>Subject: Re: Ubuntu open file limits
>>
>>Hi John,
>>
>>Run "grep -r yarn_user_nofile_limit /var/lib/ambari-agent/*”. It should give some
idea about where the 4096 value is coming from.
>>
>>-Varun
>>
>>
>>
>>On 9/30/15, 5:37 PM, "John Lilley" <john.lilley@redpoint.net> wrote:
>>
>>>Greetings,
>>>
>>>We are starting to support Ubuntu 12.04 LTS servers and HDP. But we are hitting
the "open file limits" problem. Unfortunately setting this system-wide for ubuntu seems difficult
-- no matter what we try, YARN tasks always show the result of ulimit -n as 1024 (or if we
attempt to override, 4096). Something is setting a system-wide hard open-file limit to 4096
before the ResourceManager and NodeManagers start, and our tasks also get that limit. But
this causes all sorts of problems, as you must know Hadoop really wants this limit to be 65536
or more.
>>>
>>>What I want is to change the system-wide default open-file limit for everything
so that Hadoop services and everything else pick that up. How do we do that?
>>>
>>>We're tried all of the obvious stuff from stackoverflow etc, like:
>>>
>>>
>>># vi /etc/security/limits.conf
>>>
>>>* soft nofile 65536
>>>
>>>* hard nofile 65536
>>>
>>>root soft nofile 65536
>>>
>>>root hard nofile 65536
>>>
>>>But none of this seems to affect the RM/NM limits.
>>>
>>>Thanks
>>>john
>>>
>>
>

Mime
View raw message