hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9390) Block management for maintenance states
Date Thu, 29 Sep 2016 17:36:20 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ming Ma updated HDFS-9390:
    Attachment: HDFS-9390-3.patch

Thanks [~eddyxu] for the review! Here is the new patch.

bq. If we change it to the following code, we can undo most of the DatanodeManager.java changes,
of which the motivation of these changes are not clear to me in the first sight.
The main reason is {{DatanodeManager#removeDatanode}} performs other operations such as {{heartbeatManager.removeDatanode(nodeInfo);}}
and {{blockManager.getBlockReportLeaseManager().unregister(nodeInfo);}} which should be called
when a maintenance node becomes dead.

bq. Why it does not re-calculate stats when minReplicationToBeInMaintanence == 0?
Good catch. In addition to fixing it, the new patch also updates TestNamenodeCapacityReport
to cover maintenance scenario.

bq. Is the comment correct in the context?

bq. One related question is that, why startMaintenance() and stopMaintenance() are in DecommissionManager.
This is similar to startDecommission() and stopDecommission() in DecommissionManager. I plan
to rename DecommissionManager to AdminServiceManager as part of HDFS-9388.

bq. In NumberReplicas.java, you might want consider rename int maintenance() to int maintenanceReplicas,
so is liveEnteringMaintence().

> Block management for maintenance states
> ---------------------------------------
>                 Key: HDFS-9390
>                 URL: https://issues.apache.org/jira/browse/HDFS-9390
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-9390-2.patch, HDFS-9390-3.patch, HDFS-9390.patch
> When a node is transitioned to/stay in/transitioned out of maintenance state, we need
to make sure blocks w.r.t. that nodes are properly handled.
> * When nodes are put into maintenance, it will first go to ENTERING_MAINTENANCE, and
make sure blocks are minimally replicated before the nodes are transitioned to IN_MAINTENANCE.
> * Do not replica blocks when nodes are in maintenance states. Maintenance replica will
remain in BlockMaps and thus is still considered valid from block replication point of view.
In other words, putting a node to “maintenance” mode won’t trigger BlockManager to replicate
its blocks.
> * Do not invalidate replicas on node under maintenance. After any file's replication
factor is reduced, NN needs to invalidate some replicas. It should exclude nodes under maintenance
in the handling.
> * Do not put IN_MAINTENANCE replicas in LocatedBlock for read operation.
> * Do not allocate any new block on nodes under maintenance.
> * Have Balancer exclude nodes under maintenance.
> * Exclude nodes under maintenance for DN cache.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message