hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1984) HDFS-1073: Enable multiple checkpointers to run simultaneously
Date Wed, 25 May 2011 00:58:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038898#comment-13038898
] 

Todd Lipcon commented on HDFS-1984:
-----------------------------------

bq. Can't these two threads in the test race? Imagine they would never in practice.

It's OK - delayer.waitForCall will just sit there and wait until the checkpoint thread gets
to the instrumented method. It works pretty well in the TestFileAppend4 tests.

bq. It should be rare that there's no MD5 file for an image, ie only happens when there's
an image from a previous version, therefore would it make sense to warn in places like setVerificationHeaders
where an MD5 file is not present

This same code path is also used for transferring edits. Though perhaps we can add some flag
like "requireMd5File". I'll make a note of that as a TODO.

bq. Not your change, but would be less error prone if ErrorSimulation used eg an enum CORRUPT_IMG_XFER
instead of "4".
agreed

> HDFS-1073: Enable multiple checkpointers to run simultaneously
> --------------------------------------------------------------
>
>                 Key: HDFS-1984
>                 URL: https://issues.apache.org/jira/browse/HDFS-1984
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: hdfs-1984.txt
>
>
> One of the motivations of HDFS-1073 is that it decouples the checkpoint process so that
multiple checkpoints could be taken at the same time and not interfere with each other.
> Currently on the 1073 branch this doesn't quite work right, since we have some state
and validation in FSImage that's tied to a single fsimage_N -- thus if two 2NNs perform a
checkpoint at different transaction IDs, only one will succeed.
> As a stress test, we can run two 2NNs each configured with the fs.checkpoint.interval
set to "0" which causes them to continuously checkpoint as fast as they can.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message