hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3744) Decommissioned nodes are included in cluster after switch which is not expected
Date Mon, 06 Aug 2012 13:41:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429144#comment-13429144

Aaron T. Myers commented on HDFS-3744:

bq. And I would like to add Standby check at replication monitor to avoid load in cluster.

Got it. This seems like a separate issue from what's being discussed here, though, and so
should probably be done as a separate JIRA. Do you agree?

bq. By persisting into edit logs we can be sure of which DN is decommissioned? Not only by
Standby NN but also when Standalone NN restarts.

The question that I have is still "How would differences be rectified between what's persisted
in the edit log and what's present in the excluded hosts file?" Imagine that some host is
not present in the excluded hosts file, but a decommission action for that host is present
in the edit log. Given that edit logs are occasionally merged into an fsimage and the edit
logs discarded, this would imply that we'd need to introduce a new section into the fsimage
for per-host DN status. This means that we'd end up with two potentially out of sync lists
of DN decommission status: one in the excludes file, the other in this new section of the
fsimage file.

My point is that I think persisting DN decommission status to the edit log / fsimage is not
an unreasonable idea, but it does seem like an idea that's incompatible with the excluded
hosts config file. Given that, I'm still in favor of just requiring the admin keep the excluded
hosts files in sync, and call refreshNodes on both NNs from DFSAdmin. I think this argument
is further supported by the fact that the active/standby NN having an out of sync view of
DN decommission status isn't actually that big of a problem. Yes, it might result in some
unnecessary replication traffic, but it shouldn't result in data loss or unavailability, since
DNs already ignore replication commands from anything but the active NN.
> Decommissioned nodes are included in cluster after switch which is not expected
> -------------------------------------------------------------------------------
>                 Key: HDFS-3744
>                 URL: https://issues.apache.org/jira/browse/HDFS-3744
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.0.0-alpha, 2.1.0-alpha, 2.0.1-alpha
>            Reporter: Brahma Reddy Battula
> Scenario:
> =========
> Start ANN and SNN with three DN's
> Exclude DN1 from cluster by using decommission feature 
> (./hdfs dfsadmin -fs hdfs://ANNIP:8020 -refreshNodes)
> After decommission successful,do switch such that SNN will become Active.
> Here exclude node(DN1) is included in cluster.Able to write files to excluded node since
it's not excluded.
> Checked SNN(Which Active before switch) UI decommissioned=1 and ANN UI 
> decommissioned=0
> One more Observation:
> ====================
> All dfsadmin commands will create proxy only on nn1 irrespective of Active or standby.I
think this also we need to re-look once..
> I am not getting , why we are not given HA for dfsadmin commands..?
> Please correct me,,If I am wrong.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message