hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4061) Large number of decommission freezes the Namenode
Date Thu, 20 Nov 2008 20:26:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649468#action_12649468
] 

Konstantin Shvachko commented on HADOOP-4061:
---------------------------------------------

- You changed the time units in {{dfs.namenode.decommission.interval}} from minutes to seconds.
Is it going to be a problem for those who use this config variable? If they set it to 5 (now
in minutes) then it is going to be every 5 seconds after your patch.
- Do we want to introduce {{DecommissionMonitor decommissionManager}} member in {{FSNamesystem}}?
Then we will be able to move all decommissioning logic into {{DecommissionMonitor}} or manager,
which is probably partly a goal of this patch. 
-- {{FSNamesystem.checkDecommissionStateInternal()}} should be moved in to {{DecommissionMonitor}};
-- same as {{startDecommission()}} and {{stopDecommission()}}.
- In {{isReplicationInProgress()}} could you please rename  {{decommissionBlocks}} to {{nodeBlocks}}.
It has nothing to do with decommission and is confising.

I think this throttling approach will solve the problem for now, but is not ideal. Say, if
you have 500,000 blocks rather than 30,000 then you will have to reconfigure the throttler
to scan even less nodes. 
Deleting already decommissioned blocks as Raghu proposes is also not very good. Until the
node is shut down its blocks can be  accessed for read. We don't want to change that.
I would rather go with the approach, which counts down decommissioned blocks as they are replicated.
Then there is no need to scan all blocks to verify the node is decommissioned, just check
the counter. We can add the total block scan as a sanity check in stopDecommission(). The
counter can also be a good indicator of how much decommissioning progress has been done at
every moment.
We should create a separate jira for these changes.

> Large number of decommission freezes the Namenode
> -------------------------------------------------
>
>                 Key: HADOOP-4061
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4061
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Koji Noguchi
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 4061_20081119.patch
>
>
> On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks each. Other
1500 nodes were almost empty.
> When decommission started, namenode's queue overflowed every 6 minutes.
> Looking at the cpu usage,  it showed that every 5 minutes org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor
thread was taking 100% of the CPU for 1 minute causing the queue to overflow.
> {noformat}
>   public synchronized void decommissionedDatanodeCheck() {
>     for (Iterator<DatanodeDescriptor> it = datanodeMap.values().iterator();
>          it.hasNext();) {
>       DatanodeDescriptor node = it.next();
>       checkDecommissionStateInternal(node);
>     }
>   }
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message