hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2691) HA: Tests and fixes for pipeline targets and replica recovery
Date Sat, 21 Jan 2012 00:34:40 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HDFS-2691:

    Attachment: hdfs-2691.txt

Attached patch is what I've been testing with on a cluster with HBase for a little while.

The approach is to send RBW replicas as part of the "block received and deleted" reports.
There are a couple potential optimizations we could do here:
1) only do this when HA is enabled?
2) change the client so that when it hflushed, it sends a flag to the DN which causes it to
report a RBW replica (so this only happens for blocks getting hsynced/hflushed)
3) only send these reports when a failover is detected (as discussed above)

Would really appreciate feedback on the correct design here.

I also plan to continue testing this - there's still some weirdness where RBW replicas show
up as "corrupt" for a short while after a failover, but then seem to fix themselves with no
further effort - maybe just a metrics thing.
> HA: Tests and fixes for pipeline targets and replica recovery
> -------------------------------------------------------------
>                 Key: HDFS-2691
>                 URL: https://issues.apache.org/jira/browse/HDFS-2691
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hdfs-2691.txt, hdfs-2691.txt
> Currently there are some TODOs around pipeline/recovery code in the HA branch. For example,
commitBlockSynchronization only gets sent to the active NN which may have failed over by that
point. So, we need to write some tests here and figure out what the correct behavior is.
> Another related area is the treatment of targets in the pipeline. When a pipeline is
created, the active NN adds the "expected locations" to the BlockInfoUnderConstruction, but
the DN identifiers aren't logged with the OP_ADD. So after a failover, the BlockInfoUnderConstruction
will have no targets and I imagine replica recovery would probably trigger some issues.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message