hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Kishore Bonagiri <write2kish...@gmail.com>
Subject Re: Too many open files error with YARN
Date Thu, 21 Mar 2013 06:43:34 GMT
Hi Hemanth & Sandy,

  Thanks for your reply. Yes, that indicates it is in close wait state,
exactly like below:

java      30718     dsadm  200u     IPv4         1178376459      0t0
 TCP *:50010 (LISTEN)
java      31512     dsadm  240u     IPv6         1178391921      0t0
 TCP node1:51342->node1:50010 (CLOSE_WAIT)

I just checked in at the link
https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both
in affect versions and fix versions.

There is another bug 3591, at
https://issues.apache.org/jira/browse/HDFS-3591

which says it is for backporting 3357 to branch 0.23

So, I don't understand whether the fix is really in 2.0.0-alpha, request
you to please clarify me.

Thanks,
Kishore





On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There was an issue related to hung connections (HDFS-3357). But the JIRA
> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
> checking on Sandy's suggestion
>
>
> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:
>
>> Hi Kishore,
>>
>> 50010 is the datanode port. Does your lsof indicate that the sockets are
>> in CLOSE_WAIT?  I had come across an issue like this where that was a
>> symptom.
>>
>> -Sandy
>>
>>
>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>  I am running a date command with YARN's distributed shell example in a
>>> loop of 1000 times in this way:
>>>
>>> yarn jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>> --shell_command date --num_containers 2
>>>
>>>
>>> Around 730th time or so, I am getting an error in node manager's log
>>> saying that it failed to launch container because there are "Too many open
>>> files" and when I observe through lsof command,I find that there is one
>>> instance of this kind of file is left for each run of Application Master,
>>> and it kept growing as I am running it in loop.
>>>
>>> node1:44871->node1:50010
>>>
>>> Is this a known issue? Or am I missing doing something? Please help.
>>>
>>> Note: I am working on hadoop--2.0.0-alpha
>>>
>>> Thanks,
>>> Kishore
>>>
>>
>>
>

Mime
View raw message