hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Usman Waheed" <usm...@opera.com>
Subject Corrupt edits/edits.new file maybe?
Date Wed, 09 Mar 2011 23:06:59 GMT
Hi,

For some reason my secondary namenode process died 10 days ago and that  
has left me with an edits and edits.new files in my
dfs/name/current directory. The fsimage file is also there but is old and  
does not have the merged changes from either the edits or the edits.new.  
The cluster has been running fine since the last startup which was 2 weeks  
ago.

Today i restarted the cluster and now the namenode complains with a NULL  
POINTER EXCEPTION. The last checkpoint saved is of the same size as the  
fsimage in the current directory so replacing it will not help.

This is a test cluster so worst case is i loose many changes that were not  
merged into the fsimage. I can remove the edits.new and bring the cluster  
up with a clean edits file. Will have to force the namenode out of safe  
mode but then running fsck complains that HDFS is corrupt, obviously  
missing blocks/files etc.

The question i have is if there is any way to salvage from such a  
situation? I read that one can maybe tamper with the edits and edits.new  
files to bring up the namenode but with minimum loss of data. This would  
require editing these files in a hex editor?

Is there any documentation/example maybe on how to do this or maybe it is  
not possible and not worth the effort. It would be good to know if there  
is a way out from such a situation.

I have a 3 node test cluster running Hadoop 0.20.2+737.

Appreciate if i can get any help/pointers.

Thanks,
Usman

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/

Mime
View raw message