Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 91110 invoked from network); 3 Mar 2009 01:30:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Mar 2009 01:30:23 -0000 Received: (qmail 54992 invoked by uid 500); 3 Mar 2009 01:30:17 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 54943 invoked by uid 500); 3 Mar 2009 01:30:17 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 54932 invoked by uid 99); 3 Mar 2009 01:30:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Mar 2009 17:30:17 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Mar 2009 01:30:16 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A5103234C4C1 for ; Mon, 2 Mar 2009 17:29:56 -0800 (PST) Message-ID: <2116459845.1236043796674.JavaMail.jira@brutus> Date: Mon, 2 Mar 2009 17:29:56 -0800 (PST) From: "Hairong Kuang (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5145) Balancer sometimes runs out of memory after days or weeks running In-Reply-To: <192285552.1233278639693.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678186#action_12678186 ] Hairong Kuang commented on HADOOP-5145: --------------------------------------- I plan to allow balancer to throw away blocks that was moved more than one and half hour ago. I will use two windows of moved blocks: current window and old window. When a block is added, it is always added to the current window. Between iterations, balancer tries to cleanup old moved blocks. It checks to see if the blocks in the old window is at least 1.5 hour old. If yes, it purges the old window and moves blocks in the current window to the old window. > Balancer sometimes runs out of memory after days or weeks running > ----------------------------------------------------------------- > > Key: HADOOP-5145 > URL: https://issues.apache.org/jira/browse/HADOOP-5145 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Fix For: 0.21.0 > > > The culprit is a HashMap called MovedBlocks. By design this map does not get cleaned up between iterations. This is because the deletion of source replicas is done by NN. When next iteration starts, source replicas may not have been deleted, Balancer does not want to schedule them to move again. To prevent running out of memory, Balancer should expire/clean the movedBlocks from some iterations back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.