hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
Date Wed, 14 Jan 2015 21:05:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277677#comment-14277677
] 

Andrew Wang commented on HDFS-7411:
-----------------------------------

Hi Colin, thanks for reviewing. I'll rework the patch after we settle on the details, few
replies:

bq. Shouldn't we be using that here, rather than creating our own list in decomNodeBlocks?

This is actually a feature, not a bug :) Having our own datastructure lets us speed up decom
by only checking blocks that are still insufficiently replicated. We prune out the sufficient
ones each iteration. The memory overhead here should be pretty small since it's just an 8B
reference per block, so with 1 million blocks this will be 8MB for a single node, or maybe
160MB for a full rack. Nodes are typically smaller than this this, so these are conservative
estimates, and large decoms aren't that common.

The one thing I could see as a nice improvement is that we could skip the final full scan
at the end of decom if we immediately propagate block map changes to decomNodeBlocks, but
that seems like more trouble than it's worth.

bq. have a configuration key like dfs.namenode.decommission.blocks.per.minute that expresses
directly what we want.

On thinking about it I agree that just using a new config option is fine, but I'd prefer to
define the DecomManager in terms of both an interval and an amount of work, rather than a
rate. This is more powerful, and more in-line with the existing config. Are you okay with
a new {{blocks.per.interval}} config?

bq. dfs.namenode.decommission.max.concurrent.tracked.nodes

I agree that it can lead to hangs. At a minimum, I'll add a "0 means no limit" config, and
maybe we can set that by default. I think that NNs should really have enough heap headroom
to handle the 10-100 of MBs of memory for this, it's peanuts compared to the 10s of GBs that
are quite typical.

> Refactor and improve decommissioning logic into DecommissionManager
> -------------------------------------------------------------------
>
>                 Key: HDFS-7411
>                 URL: https://issues.apache.org/jira/browse/HDFS-7411
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.5.1
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch,
hdfs-7411.005.patch, hdfs-7411.006.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message