hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Das <rahul.h...@gmail.com>
Subject Hadoop Namenode problem
Date Fri, 22 Jul 2011 07:15:57 GMT

I am running a Hadoop cluster with 20 Data node. Yesterday I found that the
Namenode was not responding ( No write/read to HDFS is happening). It got
stuck for few hours, then I shut down the Namenode and found the following
error from the Name node log.

2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) from
xx.xx.xx.xx:13568: output error

This error was coming for every data node and data nodes are not able to
communicate with the Name node

After I restart the Namenode

2011-07-21 16:31:54,110 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=9000
2011-07-21 16:31:54,223 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2011-07-21 16:31:54,226 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
2011-07-21 16:31:54,280 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-07-21 16:31:54,280 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-07-21 16:31:54,280 INFO
2011-07-21 16:31:54,287 INFO
Initializing FSNamesystemMetrics using context
2011-07-21 16:31:54,289 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
2011-07-21 16:31:54,880 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 15817482
2011-07-21 16:34:38,463 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 82
2011-07-21 16:34:41,177 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 2042701824 loaded in 166 seconds.
2011-07-21 16:58:07,624 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /home/hadoop/current/edits of size 12751835 edits # 138217 loaded
in 1406 seconds.

And it goes for a long halt. After about an hour it starts working again.

My question is when the error "IPC Server Responde" comes and is there a way
to deal with it.
Also if my Namenode is busy doing something then what is the way to find out
what it is doing.


View raw message