hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Corrupt DFS edits-file
Date Fri, 15 Dec 2006 19:27:00 GMT
Philippe,
Periodic checkpointing will bound the size of the edits file.
So it will not grow as big as it does now, and even if it will get 
corrupted that will be a relatively small amount
of information compared to current state when one can loose weeks of 
data if the name-node is not restarted periodically.

Another thing is that the name-node should fall into safe mode when a 
log edit transaction fails, and
wait until the administrator fixes the problem and turns safe mode off.

Espen,
I once had a corrupted edits file. Don't remember what was corrupted, 
but the behavior was similar, the name-node
won't start. I included some custom code into FSImage.loadFSImage to 
deal with the inconsistency.
Once the correct image was created I discarded the custom code.
In your case the log is trying to create a directory named
/user/trank/dotno/segments/20061208154235/parse_data/part-00000
which is wrong, since part-00000 is supposed to be a file.
Have you already restored your image?

--Konstantin

Philippe Gassmann wrote:

>
> Espen Amble Kolstad a écrit :
>
>> Hi,
>>
>> I run hadoop-0.9-dev and my edits-file has become corrupt. When I try to
>> start the namenode I get the following error:
>> 2006-12-08 20:38:57,431 ERROR dfs.NameNode -
>> java.io.FileNotFoundException: Parent path does not exist:
>> /user/trank/dotno/segments/20061208154235/parse_data/part-00000
>>         at
>> org.apache.hadoop.dfs.FSDirectory$INode.addNode(FSDirectory.java:186)
>>         at
>> org.apache.hadoop.dfs.FSDirectory.unprotectedMkdir(FSDirectory.java:714)
>>         at 
>> org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:254)
>>         at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:191)
>>         at
>> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:320)
>>         at 
>> org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:226)
>>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:142)
>>         at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:134)
>>         at org.apache.hadoop.dfs.NameNode.main(NameNode.java:585)
>>
>> I've grep'ed through my edits-file, to see what's wrong. It seems the
>> edits-file is missing an OP_MKDIR for
>> /user/trank/dotno/segments/20061208154235/parse_data.
>>
>> Is there a tool for fixing an edits-file, or to put in an OP_MKDIR ?
>>
>> - Espen
>
>
> Hi all,
>
> Some time ago, I had a similar issue 
> (http://issues.apache.org/jira/browse/HADOOP-760 that duplicates 
> http://issues.apache.org/jira/browse/HADOOP-227).
>
> My first thougths about that was to do automatic checkpointing by 
> merging edits logs to the fsimage (as described in HADOOP-227).
>
> But this approach cannot be considered if edits logs are corrupted (= 
> non mergeable). So I believe we should think about another recovery 
> method.
>
> AFAIK, datanodes are only aware about blocks they are owning. I think 
> we could add a little bit more information with each blocks : the path 
> on the filesystem and the block number. If the namenode is totally 
> crashed (corrupted edit logs), the fs image could be quite easily 
> rebuilt by quierying all datanodes about their blocks.
>
> WDYT ?
>
> cheers,
> -- 
> Philippe.
>
>
>


Mime
View raw message