hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs
Date Thu, 19 Aug 2010 22:29:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900499#action_12900499
] 

Todd Lipcon commented on HDFS-1073:
-----------------------------------

Hey Sanjay,

Thanks for reviving this. The notes you wrote above seem accurate.

Couple of questions:

bq. while writing edit logs to multiple files, a failure of the th system can result in different
amounts of data written to each file - the tid allows one to pick one with the most tranasactions.

Isn't this also doable by just seeing which as more non-zero bytes? ie seek to the end of
the file, scan backwards through the 0 bytes, and stop. Whichever valid log is longer wins.
Even in the case with the transaction-id, you have to do something like this for a few reasons:
a) we'd rather scan backward from the end of the edit log than forward from the beginning,
since it's going to be a faster startup, and b) even if we see a higher transaction id header
on the last entry, that entry might have been incompletely written to the file, so we still
have to verify that it deserializes correctly.

bq. Main disadvantage is that the editlogs will be little bigger.

So are you suggesting that each edit will include a header with the transaction ID in it?
Isn't this redundant if the header of the whole edit file has the starting txid -- ie is there
ever a case where we'd skip a txid?

bq. In order to do an offline fsck one can needs to dump the block map; clearly one does not
want to the local the system to do an atomic dump. The transaction id of when the dump is
started can be written in the dump to allow the fsck to report consistently.

Sorry, can you elaborate a little bit here? In order to get a consistent dump of the block
map don't we need to take the FSN lock and thus stall all operations? Is the idea that the
BackupNode would do the blockmap dump offline since it can hold a lock for some time without
stalling clients? If that's the case, what's the purpose of the offline nature of the fsck
instead of just having BackupNode allow fsck to point directly at it and access memory under
the same lock?

Mahadev said:
bq. Is it the minimum set of code changes that is making you guys reject on the txn based
snapshots and logging?

I don't think either way has been decided/rejected yet. What you're saying has been my view
- that doing txid based is a bigger change, since we have to introduce the txid concept and
add extra code that allows replaying partial edit log files (ie a subrange of the edits within).
But it's certainly doable and Sanjay has presented some good advantages.


> Simpler model for Namenode's fs Image and edit Logs 
> ----------------------------------------------------
>
>                 Key: HDFS-1073
>                 URL: https://issues.apache.org/jira/browse/HDFS-1073
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>            Assignee: Todd Lipcon
>         Attachments: hdfs1073.pdf
>
>
> The naming and handling of  NN's fsImage and edit logs can be significantly improved
resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message