hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Yarn -- one of the daemons getting killed
Date Tue, 17 Dec 2013 19:01:36 GMT
That's good info. It is more than likely that it is the OOM killer. See http://stackoverflow.com/questions/726690/who-killed-my-process-and-why
for example.

Thanks,
+Vinod

On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri <write2kishore@gmail.com> wrote:

> Hi Jeff,
> 
>   I have run the resource manager in the foreground without nohup and here are the messages
when it was killed, it says it is "Killed" but doesn't say why!
> 
> 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application appattempt_1387266015651_0258_000001
released container container_1387266015651_0258_01_000003 on node: host: isredeng:36576 #containers=2
available=7936 used=256 with event: FINISHED
> 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl: container_1387266015651_0258_01_000005
Container Transitioned from ACQUIRED to RUNNING
> Killed
> 
> 
> Thanks,
> Kishore
> 
> 
> On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman <stuckman@umd.edu> wrote:
> What if you open the daemons in a "screen" session rather than running them in the background
-- for example, run "yarn resourcemanager". Then you can see exactly when they terminate,
and hopefully why.
> 
> From: Krishna Kishore Bonagiri
> Sent: Monday, December 16, 2013 6:20 AM
> To: user@hadoop.apache.org
> Reply To: user@hadoop.apache.org
> Subject: Re: Yarn -- one of the daemons getting killed
> 
> Hi Vinod,
> 
>  Yes, I am running on Linux.
> 
>  I was actually searching for a corresponding message in /var/log/messages to confirm
that OOM killed my daemons, but could not find any corresponding messages there! According
to the following link, it looks like if it is a memory issue, I should see a messages even
if OOM is disabled, but I don't see it.
> 
> http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html
> 
>   And, is memory consumption more in case of two node cluster than a single node one?
Also, I see this problem only when I give "*" as the node name. 
> 
>   One other thing I suspected was the allowed number of user processes, I increased that
to 31000 from 1024 but that also didn't help.
> 
> Thanks,
> Kishore
> 
> 
> On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vinodkv@hortonworks.com>
wrote:
> Yes, that is what I suspect. That is why I asked if everything is on a single node. If
you are running linux, linux OOM killer may be shooting things down. When it happens, you
will see something like "'killed process" in system's syslog.
> 
> Thanks,
> +Vinod
> 
> On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <write2kishore@gmail.com>
wrote:
> 
>> Vinod,
>> 
>>   One more thing I observed is that, my Client which submits Application Master one
after another continuously also gets killed sometimes. So, it is always any of the Java Processes
that is getting killed. Does it indicate some excessive memory usage by them or something
like that, that is causing them die? If so, how can we resolve this kind of issue?
>> 
>> Thanks,
>> Kishore
>> 
>> 
>> On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <write2kishore@gmail.com>
wrote:
>> No, I am running on 2 node cluster.
>> 
>> 
>> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vinodkv@hortonworks.com>
wrote:
>> Is all of this on a single node?
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <write2kishore@gmail.com>
wrote:
>> 
>>> Hi,
>>>   I am running a small application on YARN (2.2.0) in a loop of 500 times, and
while doing so one of the daemons, node manager, resource manager, or data node is getting
killed (I mean disappearing) at a random point. I see no information in the corresponding
log files. How can I know why is it happening so?
>>> 
>>>  And, one more observation is that, this is happening only when I am using "*"
for node name in the container requests, otherwise when I used a specific node name, everything
is fine.
>>> 
>>> Thanks,
>>> Kishore
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which
it is addressed and may contain information that is confidential, privileged and exempt from
disclosure under applicable law. If the reader of this message is not the intended recipient,
you are hereby notified that any printing, copying, dissemination, distribution, disclosure
or forwarding of this communication is strictly prohibited. If you have received this communication
in error, please contact the sender immediately and delete it from your system. Thank You.
>> 
>> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it
is addressed and may contain information that is confidential, privileged and exempt from
disclosure under applicable law. If the reader of this message is not the intended recipient,
you are hereby notified that any printing, copying, dissemination, distribution, disclosure
or forwarding of this communication is strictly prohibited. If you have received this communication
in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message