hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Kishore Bonagiri <write2kish...@gmail.com>
Subject Node manager or Resource Manager crash
Date Tue, 04 Mar 2014 14:53:09 GMT
  I am running an application on a 2-node cluster, which tries to acquire
all the containers that are available on one of those nodes and remaining
containers from the other node in the cluster. When I run this application
continuously in a loop, one of the NM or RM is getting killed at a random
point. There is no corresponding message in the log files.

One of the times that NM had got killed today, the tail of the it's log is
like this:

2014-03-04 02:42:44,386 DEBUG
isredeng:52867 sending out status for 16 containers
2014-03-04 02:42:44,386 DEBUG
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
health-status : true,

And at the time of NM's crash, the RM's log has the following entries:

2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
isredeng:52867 of type STATUS_UPDATE
2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server
Responder: responding to
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from Call#14060 Retry#0 Wrote 40 bytes.
2014-03-04 02:42:40,371 DEBUG
nodeUpdate: isredeng:52867 clusterResources:
<memory:16384, vCores:16>
2014-03-04 02:42:40,371 DEBUG
Node being looked for scheduling isredeng:52867
availableResource: <memory:0, vCores:-8>
2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151

Note: the name of the node on which NM has got killed is isredeng, does it
indicate anything from the above message as to why it got killed?


View raw message