Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Message-ID: <28551218.457161282256960439.JavaMail.jira@thor>
Date: Thu, 19 Aug 2010 18:29:20 -0400 (EDT)
From: "Todd Lipcon (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Subject: [jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image
 and edit Logs
In-Reply-To: <1207504890.648331270171887350.JavaMail.jira@brutus.apache.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900499#action_12900499 ] 

Todd Lipcon commented on HDFS-1073:
-----------------------------------

Hey Sanjay,

Thanks for reviving this. The notes you wrote above seem accurate.

Couple of questions:

bq. while writing edit logs to multiple files, a failure of the th system can result in different amounts of data written to each file - the tid allows one to pick one with the most tranasactions.

Isn't this also doable by just seeing which as more non-zero bytes? ie seek to the end of the file, scan backwards through the 0 bytes, and stop. Whichever valid log is longer wins. Even in the case with the transaction-id, you have to do something like this for a few reasons: a) we'd rather scan backward from the end of the edit log than forward from the beginning, since it's going to be a faster startup, and b) even if we see a higher transaction id header on the last entry, that entry might have been incompletely written to the file, so we still have to verify that it deserializes correctly.

bq. Main disadvantage is that the editlogs will be little bigger.

So are you suggesting that each edit will include a header with the transaction ID in it? Isn't this redundant if the header of the whole edit file has the starting txid -- ie is there ever a case where we'd skip a txid?

bq. In order to do an offline fsck one can needs to dump the block map; clearly one does not want to the local the system to do an atomic dump. The transaction id of when the dump is started can be written in the dump to allow the fsck to report consistently.

Sorry, can you elaborate a little bit here? In order to get a consistent dump of the block map don't we need to take the FSN lock and thus stall all operations? Is the idea that the BackupNode would do the blockmap dump offline since it can hold a lock for some time without stalling clients? If that's the case, what's the purpose of the offline nature of the fsck instead of just having BackupNode allow fsck to point directly at it and access memory under the same lock?

Mahadev said:
bq. Is it the minimum set of code changes that is making you guys reject on the txn based snapshots and logging?

I don't think either way has been decided/rejected yet. What you're saying has been my view - that doing txid based is a bigger change, since we have to introduce the txid concept and add extra code that allows replaying partial edit log files (ie a subrange of the edits within). But it's certainly doable and Sanjay has presented some good advantages.


> Simpler model for Namenode's fs Image and edit Logs 
> ----------------------------------------------------
>
>                 Key: HDFS-1073
>                 URL: https://issues.apache.org/jira/browse/HDFS-1073
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>            Assignee: Todd Lipcon
>         Attachments: hdfs1073.pdf
>
>
> The naming and handling of  NN's fsImage and edit logs can be significantly improved resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.