hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs
Date Fri, 05 Nov 2010 19:04:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928764#action_12928764

Todd Lipcon commented on HDFS-1073:

Hey all. Back in town after a few weeks in Japan, sorry for the relative absence.

bq. I do not see or did not understand the rational for "I'm quitting!" record. Why should
NN care whether last record was lost or not, just keep going with what it has. Worked so far.

I think one complication here is that we currently never have to re-open an edits file for
append, since when we start, we always save a "fresh" checkpoint image and empty "edits" if
there were any edits to apply. One advantage of the new design is that we no longer have to
do this - we just bump the edits log number to the next one in sequence - ie we roll on startup
if the latest edit log is non-empty.

bq. Also the "rolled" transaction is a nice way to to tell the BN that the primary did a roll
without any special message from NN to BNN

The patch currently does exactly that - we just don't write down the special "roll" entry
in any file streams. We certainly could, though, if it's useful to know that a file was completely

bq. Todd, I briefly looked at the patch. It looks like you are trying to get rid of the Journal
Spool in BN. Correct me if I am wrong. I don't think you can

In the patch, the spooling has just become a bit more of a general case. Rather than spooling
to a special file, we simply ask the primary NN to roll, and then wait for the roll to happen.
While waiting for the roll, we continue to apply edits. One we get the special "roll" record,
we stop applying edits and make a checkpoint at that point. Once the checkpoint completes,
we "converge" by continuing to read forward in the sequence of log files until we hit the
end and are back "in sync"

bq. A backup NN should not ask for a roll. The primary should roll when it feels it is necessary.

I think the simplest will be if anyone may ask for a roll - ie CN, BN, or NN. The NN of course
is the one that actually makes the decision, but the decision may be in response to a request
from one of the other nodes. I think this ability is useful not just for CN,BN, and NN, but
also for example in backup scripts - you may ask the NN to roll right before making a tarball
of the edits directory, and thus be sure that you get all of the current edits in "finalized"

> Simpler model for Namenode's fs Image and edit Logs 
> ----------------------------------------------------
>                 Key: HDFS-1073
>                 URL: https://issues.apache.org/jira/browse/HDFS-1073
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1073.txt, hdfs1073.pdf
> The naming and handling of  NN's fsImage and edit logs can be significantly improved
resulting simpler and more robust code.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message