hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3605) Block mistakenly marked corrupt during edit log catchup phase of failover
Date Sun, 15 Jul 2012 05:53:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414580#comment-13414580

Uma Maheswara Rao G commented on HDFS-3605:

 Thanks a lot, Todd for the patch.

I have taken a quick look on the patch. Yes, this approach should work as well. 
Blocks will get processed for all the ops, so, the matching to current genStamp will get processed
in current iteration and future ones will get postponed again. 

A few comments on patch: Did not check for any javadoc issues as you mentioned already, i.e,
 will work on javadocs.

+      out.writeBytes("/data");
+      // TODO: why do we need an hflush for this test case to fail?
I remember, this is just added to ensure tthat the current packet will be en-queued and block
will get allocated.
Other wise less than 64K content may not be flushed at that time.

DFSTestUtil.appendFile(fs, fileToAppend, "data");
Having the multiple append calls can give the regression for the case where we have many genstamp
and they got processed in order and future ones will get postponed.

// Wait till DN reports blocks
+      cluster.triggerBlockReports();
comment need to update?

do we need to change the variable name? Since blocks are not declared as invalid yet.

Will take a look deeply on the patch tomorrow again. ( not able to concentrate much, as I
am traveling today)

> Block mistakenly marked corrupt during edit log catchup phase of failover
> -------------------------------------------------------------------------
>                 Key: HDFS-3605
>                 URL: https://issues.apache.org/jira/browse/HDFS-3605
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, name-node
>    Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>            Reporter: Brahma Reddy Battula
>            Assignee: Todd Lipcon
>         Attachments: HDFS-3605.patch, TestAppendBlockMiss.java, hdfs-3605.txt
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding the block
to corrupted block. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message