hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Kishore Bonagiri <write2kish...@gmail.com>
Subject Re: Too many open files error with YARN
Date Thu, 21 Mar 2013 14:53:29 GMT
Hi Hemanth,
  Thanks for the reply, I shall try to get that jstack and reply back, I am
also trying to download hadoop-2.0.3-alpha and see if I can overcome this
error.

Thanks,
Kishore




On Thu, Mar 21, 2013 at 3:24 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> There is a way to confirm if it is the same bug. Can you pick a jstack on
> the process that has established a connection to 50010 and post it here..
>
> Thanks
> hemanth
>
>
> On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Hemanth & Sandy,
>>
>>   Thanks for your reply. Yes, that indicates it is in close wait state,
>> exactly like below:
>>
>> java      30718     dsadm  200u     IPv4         1178376459      0t0
>>    TCP *:50010 (LISTEN)
>> java      31512     dsadm  240u     IPv6         1178391921      0t0
>>    TCP node1:51342->node1:50010 (CLOSE_WAIT)
>>
>> I just checked in at the link
>> https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha
>> both in affect versions and fix versions.
>>
>> There is another bug 3591, at
>> https://issues.apache.org/jira/browse/HDFS-3591
>>
>> which says it is for backporting 3357 to branch 0.23
>>
>> So, I don't understand whether the fix is really in 2.0.0-alpha, request
>> you to please clarify me.
>>
>> Thanks,
>> Kishore
>>
>>
>>
>>
>>
>> On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>
>>> There was an issue related to hung connections (HDFS-3357). But the JIRA
>>> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
>>> checking on Sandy's suggestion
>>>
>>>
>>> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> 50010 is the datanode port. Does your lsof indicate that the sockets
>>>> are in CLOSE_WAIT?  I had come across an issue like this where that was a
>>>> symptom.
>>>>
>>>> -Sandy
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri <
>>>> write2kishore@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I am running a date command with YARN's distributed shell example in
>>>>> a loop of 1000 times in this way:
>>>>>
>>>>> yarn jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar
>>>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar
>>>>> --shell_command date --num_containers 2
>>>>>
>>>>>
>>>>> Around 730th time or so, I am getting an error in node manager's log
>>>>> saying that it failed to launch container because there are "Too many
open
>>>>> files" and when I observe through lsof command,I find that there is one
>>>>> instance of this kind of file is left for each run of Application Master,
>>>>> and it kept growing as I am running it in loop.
>>>>>
>>>>> node1:44871->node1:50010
>>>>>
>>>>> Is this a known issue? Or am I missing doing something? Please help.
>>>>>
>>>>> Note: I am working on hadoop--2.0.0-alpha
>>>>>
>>>>> Thanks,
>>>>> Kishore
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message