hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Wu <woo....@gmail.com>
Subject NameNode stuck in safemode without few missing blocks
Date Thu, 07 Jul 2011 13:14:52 GMT

We encountered a strange situation when restarting NameNode: it can not
leave safe mode automatically. "The ratio of reported blocks 0.9986 has not
reached the threshold 0.999". Our cluster has totally 83,276,820 blocks. So,
if the counter is right, we are missing about 116,587 blocks. But fsck
reported 83,276,779 blocks were healthy and 37 blocks in open files. Only 4
blocks were marked as corrupt because its length is shorter than existing
ones. If the fsck result is believable, we got ratio higher than 0.999999
and the threshold was reached.

I think maybe the counter of blockSafe didn't function accurately. Is that
possible? Our case is similar to the situation described in jira:
https://issues.apache.org/jira/browse/HADOOP-2159 (our Hadoop release
already included this patch).

Any suggestions?


View raw message