hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
Date Wed, 14 Jan 2015 20:12:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277593#comment-14277593
] 

Colin Patrick McCabe commented on HDFS-7411:
--------------------------------------------

DecomissionManager:

Let's document the locking here.  I believe all of these functions are designed to be run
with the FSNamesystem write lock held, correct?

A question about the data structures here.  We already have a way of iterating through all
the blocks on a given {{DataNode}} via the implicit linked lists in the {{BlockInfo}} objects.
 Shouldn't we be using that here, rather than creating our own list in {{decomNodeBlocks}}?
 This could be a lot of memory saved here, since datanodes might have anywhere between 200k
and a million blocks in the next few years.  (I am fine with doing this in a follow-on, of
course.)

{{DecomissionManager#startDecommission}}: let's have a debug log message here for the case
where the node is already decommissioned... it might help with debugging.  Similarly in {{stopDecomission}}...
if we're not doing anything, let's log why we're not doing anything at debug or trace level.

Config Keys:

We talked about this a bit earlier but didn't really come to any consensus.  I think we should
just get rid of {{dfs.namenode.decommission.nodes.per.interval}} and {{dfs.namenode.decommission.blocks.per.node}},
and just have a configuration key like {{dfs.namenode.decommission.blocks.per.minute}} that
expresses directly what we want.

If we set a reasonable default for {{dfs.namenode.decommission.blocks.per.minute}}, the impact
on users will be minimal.  The old rate-limiting process was broken anyway... that's a big
part of what this patch is designed to fix, as I understand.  So we shouldn't need to lug
around these old config keys that don't really express what we're trying to configure.  Let's
just add a new configuration key that is easy to understand, and maintain in the future. 
Decom is a manual process anywhere where admins get involved

{{dfs.namenode.decommission.max.concurrent.tracked.nodes}}: I have mixed feelings about this
configuration key.  It seems like the reason you want to add it is to limit NN memory consumption
(but you can do that by not duplicating data structures-- see above).  However, it may have
some value for users who would rather finish decomissioning 100 nodes in an hour, than have
1000 nodes 10% decomissioned in that time.  So I guess it is a good addition, maybe?  The
only problem is that if some of those first 100 nodes get stuck because there is an open-for-write
file or something, then the decom process will start to slow down and perhaps eventually hang.

On a related note, I feel like we should have a follow-up change to make the decom parameters
reconfigurable via the configuration reloading interface we added recently.  I will file a
follow-on JIRA for that.

{code}
            if (LOG.isDebugEnabled()) {
              StringBuilder b = new StringBuilder("Node {} ");
              if (isHealthy) {
                b.append("is ");
              } else {
                b.append("isn't ");
              }
              b.append("and still needs to replicate {} more blocks, " +
                  "decommissioning is still in progress.");
{code}
Missing "healthy " in the printout.

Can we log this at info level like every half hour or something, so that people can see issues
with nodes getting "stuck"?  As it is, it seems like they'll get no output unless they monkey
with log4j manually.

{code}
   * Note also that the reference to the list of under-replicated blocks 
   * will be null on initial addx
{code}
Spelling?

{code}
      final Iterator<Map.Entry<DatanodeDescriptor, AbstractList<BlockInfo>>>
                                             
          it =
          new CyclicIteration<DatanodeDescriptor, AbstractList<BlockInfo>>(  
                                            
              decomNodeBlocks, iterkey).iterator();                                      
                                
{code}
Well, this is creative, but I think I'd rather have the standard indentation :)

> Refactor and improve decommissioning logic into DecommissionManager
> -------------------------------------------------------------------
>
>                 Key: HDFS-7411
>                 URL: https://issues.apache.org/jira/browse/HDFS-7411
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.5.1
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch,
hdfs-7411.005.patch, hdfs-7411.006.patch
>
>
> Would be nice to split out decommission logic from DatanodeManager to DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message