Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 1076 invoked from network); 20 Nov 2008 20:27:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Nov 2008 20:27:08 -0000 Received: (qmail 6768 invoked by uid 500); 20 Nov 2008 20:27:14 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 6728 invoked by uid 500); 20 Nov 2008 20:27:14 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 6717 invoked by uid 99); 20 Nov 2008 20:27:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Nov 2008 12:27:14 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Nov 2008 20:25:59 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D3B45234C29F for ; Thu, 20 Nov 2008 12:26:44 -0800 (PST) Message-ID: <1125091392.1227212804866.JavaMail.jira@brutus> Date: Thu, 20 Nov 2008 12:26:44 -0800 (PST) From: "Konstantin Shvachko (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4061) Large number of decommission freezes the Namenode In-Reply-To: <930258889.1220465144195.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649468#action_12649468 ] Konstantin Shvachko commented on HADOOP-4061: --------------------------------------------- - You changed the time units in {{dfs.namenode.decommission.interval}} from minutes to seconds. Is it going to be a problem for those who use this config variable? If they set it to 5 (now in minutes) then it is going to be every 5 seconds after your patch. - Do we want to introduce {{DecommissionMonitor decommissionManager}} member in {{FSNamesystem}}? Then we will be able to move all decommissioning logic into {{DecommissionMonitor}} or manager, which is probably partly a goal of this patch. -- {{FSNamesystem.checkDecommissionStateInternal()}} should be moved in to {{DecommissionMonitor}}; -- same as {{startDecommission()}} and {{stopDecommission()}}. - In {{isReplicationInProgress()}} could you please rename {{decommissionBlocks}} to {{nodeBlocks}}. It has nothing to do with decommission and is confising. I think this throttling approach will solve the problem for now, but is not ideal. Say, if you have 500,000 blocks rather than 30,000 then you will have to reconfigure the throttler to scan even less nodes. Deleting already decommissioned blocks as Raghu proposes is also not very good. Until the node is shut down its blocks can be accessed for read. We don't want to change that. I would rather go with the approach, which counts down decommissioned blocks as they are replicated. Then there is no need to scan all blocks to verify the node is decommissioned, just check the counter. We can add the total block scan as a sanity check in stopDecommission(). The counter can also be a good indicator of how much decommissioning progress has been done at every moment. We should create a separate jira for these changes. > Large number of decommission freezes the Namenode > ------------------------------------------------- > > Key: HADOOP-4061 > URL: https://issues.apache.org/jira/browse/HADOOP-4061 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.17.2 > Reporter: Koji Noguchi > Assignee: Tsz Wo (Nicholas), SZE > Attachments: 4061_20081119.patch > > > On 1900 nodes cluster, we tried decommissioning 400 nodes with 30k blocks each. Other 1500 nodes were almost empty. > When decommission started, namenode's queue overflowed every 6 minutes. > Looking at the cpu usage, it showed that every 5 minutes org.apache.hadoop.dfs.FSNamesystem$DecommissionedMonitor thread was taking 100% of the CPU for 1 minute causing the queue to overflow. > {noformat} > public synchronized void decommissionedDatanodeCheck() { > for (Iterator it = datanodeMap.values().iterator(); > it.hasNext();) { > DatanodeDescriptor node = it.next(); > checkDecommissionStateInternal(node); > } > } > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.