hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@apache.org>
Subject Re: Node manager or Resource Manager crash
Date Tue, 04 Mar 2014 17:21:34 GMT
I remember you asking this question before. Check if your OS' OOM killer is killing it.


On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri <write2kishore@gmail.com> wrote:

> Hi,
>   I am running an application on a 2-node cluster, which tries to acquire all the containers
that are available on one of those nodes and remaining containers from the other node in the
cluster. When I run this application continuously in a loop, one of the NM or RM is getting
killed at a random point. There is no corresponding message in the log files.
> One of the times that NM had got killed today, the tail of the it's log is like this:
> 2014-03-04 02:42:44,386 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
isredeng:52867 sending out status for 16 containers
> 2014-03-04 02:42:44,386 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Node's health-status : true,
> And at the time of NM's crash, the RM's log has the following entries:
> 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
Processing isredeng:52867 of type STATUS_UPDATE
> 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching
the event 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
> 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server Responder: responding
to org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 
> Call#14060 Retry#0 Wrote 40 bytes.
> 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
nodeUpdate: isredeng:52867 clusterResources: 
> <memory:16384, vCores:16>
> 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Node being looked for scheduling isredeng:52867 
> availableResource: <memory:0, vCores:-8>
> 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151
> Note: the name of the node on which NM has got killed is isredeng, does it indicate anything
from the above message as to why it got killed?
> Thanks,
> Kishore

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message