hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srikanth Sundarrajan <srik...@yahoo.com>
Subject Issue with Namenode storage
Date Sat, 31 Dec 2011 18:02:05 GMT
Hi,
    Namenode storage on our cluster went out suddenly with the following error

2011-12-30 18:24:59,857 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable
to open edit log file /data/d1/hadoop-data/hadoop-hdfs/dfs/name/current/edits
2011-12-30 18:24:59,858 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable
to open edit log file /data/d2/hadoop-data/hadoop-hdfs/dfs/name/current/edits


dfshealth.jsp reports the Storage to be unhealthy

NameNode Storage:
Storage Directory                                              Type
                      State
/data/d1/hadoop-data/hadoop-hdfs/dfs/name       IMAGE_AND_EDITS Failed
/data/d2/hadoop-data/hadoop-hdfs/dfs/name       IMAGE_AND_EDITS Failed


While the cluster is functional the edit logs are not being actively written to. Looks like
if the cluster were to be restarted, we would loose changes before this error occurred.

It seems like image update from the secondary namenode caused this. Previous pushes from the
SNN were OK though.

2011-12-30 18:24:59,855 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage
from 10.3.0.161
2011-12-30 18:24:59,855 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 5552 Total time for transactions(ms): 32Number of transactions batched in Syncs:
694 Number of syncs: 2690 SyncTimes(ms): 1789 1669 
2011-12-30 18:24:59,857 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to
open edit log file /data/d1/hadoop-data/hadoop-hdfs/dfs/name/current/edits
2011-12-30 18:24:59,858 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to
open edit log file /data/d2/hadoop-data/hadoop-hdfs/dfs/name/current/edits

dfs.name.dir contents:
----------------------------
/data/d1/hadoop-data/hadoop-hdfs/dfs/name:
total 12
drwxr-xr-x 2 hdfs hdfs 4096 2011-12-30 18:24 current
drwxr-xr-x 2 hdfs hdfs 4096 2011-11-10 18:41 image
-rw-r--r-- 1 hdfs hdfs    0 2011-11-28 18:43 in_use.lock
drwxr-xr-x 2 hdfs hdfs 4096 2011-11-23 10:51 previous.checkpoint

/data/d1/hadoop-data/hadoop-hdfs/dfs/name/current:
total 1938828
-rw-r--r-- 1 hdfs hdfs    660736 2011-12-30 18:24 edits
-rw-r--r-- 1 hdfs hdfs 991278946 2011-12-30 18:17 fsimage
-rw-r--r-- 1 hdfs hdfs 991452881 2011-12-30 18:24 fsimage.ckpt
-rw-r--r-- 1 hdfs hdfs         8 2011-12-30 18:17 fstime
-rw-r--r-- 1 hdfs hdfs       101 2011-12-30 18:17 VERSION

/data/d1/hadoop-data/hadoop-hdfs/dfs/name/image:
total 4
-rw-r--r-- 1 hdfs hdfs 157 2011-12-30 18:17 fsimage

/data/d1/hadoop-data/hadoop-hdfs/dfs/name/previous.checkpoint:
total 9013076
-rw-r--r-- 1 hdfs hdfs 8705120607 2011-11-28 17:31 edits
-rw-r--r-- 1 hdfs hdfs  516045025 2011-11-23 10:51 fsimage
-rw-r--r-- 1 hdfs hdfs          8 2011-11-23 10:51 fstime
-rw-r--r-- 1 hdfs hdfs        101 2011-11-23 10:51 VERSION

getImage servlet on NN returns the following error

2011-12-31 13:12:34,686 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed.
java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.namenode.FSImage.getImageFile(FSImage.java:219)
at org.apache.hadoop.hdfs.server.namenode.FSImage.getFsImageName(FSImage.java:1584)
at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:75)
at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:70)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)


What are the causes for this error? What can be done to restart the cluster without loss of
data? Any solutions/pointers are greatly appreciated.

Regards
Srikanth Sundarrajan

Mime
View raw message