hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayakumar B <vinayakuma...@huawei.com>
Subject RE: Yarn -- one of the daemons getting killed
Date Mon, 16 Dec 2013 12:11:34 GMT
Hi Krishna,

Please check the out files as well for daemons. You may find something.


Cheers,
Vinayakumar B

From: Krishna Kishore Bonagiri [mailto:write2kishore@gmail.com]
Sent: 16 December 2013 16:50
To: user@hadoop.apache.org
Subject: Re: Yarn -- one of the daemons getting killed

Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages to confirm that
OOM killed my daemons, but could not find any corresponding messages there! According to the
following link, it looks like if it is a memory issue, I should see a messages even if OOM
is disabled, but I don't see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html

  And, is memory consumption more in case of two node cluster than a single node one? Also,
I see this problem only when I give "*" as the node name.

  One other thing I suspected was the allowed number of user processes, I increased that to
31000 from 1024 but that also didn't help.

Thanks,
Kishore

On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli <vinodkv@hortonworks.com<mailto:vinodkv@hortonworks.com>>
wrote:
Yes, that is what I suspect. That is why I asked if everything is on a single node. If you
are running linux, linux OOM killer may be shooting things down. When it happens, you will
see something like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <write2kishore@gmail.com<mailto:write2kishore@gmail.com>>
wrote:


Vinod,

  One more thing I observed is that, my Client which submits Application Master one after
another continuously also gets killed sometimes. So, it is always any of the Java Processes
that is getting killed. Does it indicate some excessive memory usage by them or something
like that, that is causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore

On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri <write2kishore@gmail.com<mailto:write2kishore@gmail.com>>
wrote:
No, I am running on 2 node cluster.

On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli <vinodkv@hortonworks.com<mailto:vinodkv@hortonworks.com>>
wrote:
Is all of this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <write2kishore@gmail.com<mailto:write2kishore@gmail.com>>
wrote:


Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing
so one of the daemons, node manager, resource manager, or data node is getting killed (I mean
disappearing) at a random point. I see no information in the corresponding log files. How
can I know why is it happening so?

 And, one more observation is that, this is happening only when I am using "*" for node name
in the container requests, otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.




CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.


Mime
View raw message