hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs
Date Mon, 05 Apr 2010 03:27:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853290#action_12853290

Todd Lipcon commented on HDFS-1073:

bq.  what are the pros-and-cons of numbering the files sequentially, fsimage_0, fsimage_1,
etc vs appending the last known transaction into the filename?

Interesting question. The pro I can think of for sequential numbering (0,1,2...) is that we
can determine whether there is a "gap" in edit logs without looking at file contents. For
example, if we see edits_0, edits_1, edits_3 we know that this edits directory is corrupt
since we missed edits_2. Whereas with txn IDs we can only detect a gap by reading through
the entirety of the file and counting transactions.

The pro of txid numbering is that we can detect the case where some middle log got truncated.
For example, if we have edits_0, edits_1000, and edits_2000, but edits_1000 only contains
500 edits, we can fail at that point.

However, there's nothing stopping us from getting the benefits of both - we could either make
the filenames something like edits_<idx>_<first txid>, or just make sure we store
the first txid in the header of the edit log.

Sanjay mentioned "it decouples the split (ie roll) of the edit log and the checkpoint of the
image" but I'm not sure what he meant by that. I think we can still achieve the same goal
using indexed files, as long as each roll increments the index. So, if we roll three times
but only succeed to checkpoint once, we'd see fsimage_0, edits_0, edits_1, edits_2, fsimage_2,
edits_3 (where fsimage_0 and edits_0 through edits_2 may be GCed according to ageout policy)

bq. this is very different from what we currently got in the trunk. And this is a heavyweight

Agree this is a large change, however I think it will reduce the amount of complicated statemachine
code, and we know there are several very tricky bugs in the trunk implementation. I think
this simpler design will be easier to understand and thus harder to write bugs into. Plus,
it has the nice property that even if there is a bug it will be _very_ hard to write one that
corrupts the data since old versions can be lazily deleted and are never modified after close.

> Simpler model for Namenode's fs Image and edit Logs 
> ----------------------------------------------------
>                 Key: HDFS-1073
>                 URL: https://issues.apache.org/jira/browse/HDFS-1073
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
> The naming and handling of  NN's fsImage and edit logs can be significantly improved
resulting simpler and more robust code.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message