hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Corrupt DFS edits-file
Date Fri, 08 Dec 2006 23:59:57 GMT
Could you please add your comments to HADOOP-745.
http://issues.apache.org/jira/browse/HADOOP-745

It could be helpful for who ever is going to fix it.

Christian Kunz wrote:

>FYI: there is an open issue for this:
>HADOOP-745
>
>-Christian 
>
>-----Original Message-----
>From: Dhruba Borthakur [mailto:dhruba@yahoo-inc.com] 
>Sent: Friday, December 08, 2006 2:46 PM
>To: hadoop-user@lucene.apache.org
>Subject: RE: Corrupt DFS edits-file
>
>Hi Albert and Espen,
>
>With an eye on debugging more on this issue, I have the following questions:
>1. Did you have more than one directory in dfs.name.dir? 
>2. Was this a new cluster or was it an existing cluster and was upgraded to
>0.9.0 recently?
>3. Did any unnatural  Namenode restarts occur immediately before the problem
>started occurring?
>
>With an eye on making it easier to recover from such a corruption:
>1. Will it help to make the fsimage/edit file ascii, so that it can be
>easily edited by hand?
>2. Does it make sense for HDFS to automatically create a directory
>equivalent to /lost+found? While EditLog processing, if the parent directory
>of a file does not exist, the file can go into /lost+found?
>
>Thanks,
>dhruba
>
>-----Original Message-----
>From: Albert Chern [mailto:albert.chern@gmail.com]
>Sent: Friday, December 08, 2006 1:43 PM
>To: hadoop-user@lucene.apache.org
>Subject: Re: Corrupt DFS edits-file
>
>This happened to me too, but the problem was the OP_MKDIR instructions were
>in the wrong order.  That is, in the edits file the parent directory was
>created after the child.  Maybe you should check to see if that's the case.
>
>I fixed it by using vi in combination with xxd.  When you have the file open
>in vi, press escape and issue the command "%!xxd".  This will convert the
>binary file to hexadecimal.  Then you can search through and perform the
>necessary edits.  I don't remember what the bytes were, but it was something
>like opcode, length of path (in binary), path.  After you're done, issue the
>command "%!xxd -r" to revert it to binary.  Remember to back up your files
>when you do this!
> I also had to kick off a trailing byte that got tagged on for some reason
>during the binary/hex conversion.
>
>Anyhow, this is a serious bug and could lead to data loss for a lot of
>people.  I think we should report it.
>
>On 12/8/06, Espen Amble Kolstad <espen@trank.no> wrote:
>  
>
>>Hi,
>>
>>I run hadoop-0.9-dev and my edits-file has become corrupt. When I try 
>>to start the namenode I get the following error:
>>2006-12-08 20:38:57,431 ERROR dfs.NameNode -
>>java.io.FileNotFoundException: Parent path does not exist:
>>/user/trank/dotno/segments/20061208154235/parse_data/part-00000
>>        at
>>org.apache.hadoop.dfs.FSDirectory$INode.addNode(FSDirectory.java:186)
>>        at
>>org.apache.hadoop.dfs.FSDirectory.unprotectedMkdir(FSDirectory.java:714)
>>        at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:254)
>>        at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:191)
>>        at
>>org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:320)
>>        at
>>    
>>
>org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:226)
>  
>
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:142)
>>        at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:134)
>>        at org.apache.hadoop.dfs.NameNode.main(NameNode.java:585)
>>
>>I've grep'ed through my edits-file, to see what's wrong. It seems the 
>>edits-file is missing an OP_MKDIR for 
>>/user/trank/dotno/segments/20061208154235/parse_data.
>>
>>Is there a tool for fixing an edits-file, or to put in an OP_MKDIR ?
>>
>>- Espen
>>
>>    
>>
>
>
>
>
>  
>


Mime
View raw message