hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2691) HA: Tests and fixes for pipeline targets and replica recovery
Date Mon, 23 Jan 2012 21:43:44 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191496#comment-13191496

Aaron T. Myers commented on HDFS-2691:

bq. My only worry with the above designs is that this might trigger the case of HDFS-2791...

Talked this over with Todd this morning, and we both agree that the fix he's working on for
HDFS-2791 should address this concern. Given that, "Solution 1" above seems to be clearly
the best way forward on this issue.

I've reviewed the latest patch and it largely looks good. A few comments, mostly nits:

# Perhaps we should rename PIPELINE_STARTED to RECEIVING_BLOCK? Seems more in line with the
other members of the BlockStatus enum, RECEIVED_BLOCK and DELETED_BLOCK. It's also more in
line with the new BPOfferService method, notifyNamenodeReceivingBlock.
# Given that we're now also handling the PIPELINE_STARTED case, perhaps we should rename the
BlockManager#blockReceivedAndDeleted method to reflect this additional function?
# In BlockManager#blockReceivedAndDeleted do you really think it's reasonable to only warn
here? I'd be in favor of at least bumping to ERROR, maybe even throwing an exception.
+          NameNode.stateChangeLog.warn(
+              "Unknown block status code reported by " + nodeID.getName() +
+              ": " + rdbi);
# Typo: preceeds -> precedes
# In DataNode#notifyNamenodeReceivingBlock, do you really think failing to find a BPOS for
the given BP ID is only worthy of a WARN and not an ERROR? I realize it's consistent with
DataNode#notifyNamenodeReceivedBlock and DataNode#notifyNamenodeDeletedBlock, but it seems
like they should all be ERROR.
# Does it make sense to rename the ReceivedDeletedBlockInfo class to something more general,
now that it's also being used for the PIPELINE_STARTED case?
# The comment at the top of ReceivedDeletedBlockInfo should probably also mention the fact
that it now stores BlockStatus as well.
# Any reason BlockStatus#code isn't private?
# Similar to above, seems like we should change the protobuf enum BlockStatus#PIPELINE_STARTED
-> BlockStatus#RECEIVING
# There's a few TODOs in the tests which reference HDFS-2693, which I think can be removed.
> HA: Tests and fixes for pipeline targets and replica recovery
> -------------------------------------------------------------
>                 Key: HDFS-2691
>                 URL: https://issues.apache.org/jira/browse/HDFS-2691
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hdfs-2691.txt, hdfs-2691.txt
> Currently there are some TODOs around pipeline/recovery code in the HA branch. For example,
commitBlockSynchronization only gets sent to the active NN which may have failed over by that
point. So, we need to write some tests here and figure out what the correct behavior is.
> Another related area is the treatment of targets in the pipeline. When a pipeline is
created, the active NN adds the "expected locations" to the BlockInfoUnderConstruction, but
the DN identifiers aren't logged with the OP_ADD. So after a failover, the BlockInfoUnderConstruction
will have no targets and I imagine replica recovery would probably trigger some issues.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message