Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 74427 invoked from network); 30 Oct 2007 02:45:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Oct 2007 02:45:12 -0000 Received: (qmail 51373 invoked by uid 500); 30 Oct 2007 02:44:59 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 51348 invoked by uid 500); 30 Oct 2007 02:44:59 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 51336 invoked by uid 99); 30 Oct 2007 02:44:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2007 19:44:59 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2007 02:45:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E19CA714187 for ; Mon, 29 Oct 2007 19:44:50 -0700 (PDT) Message-ID: <33169940.1193712290921.JavaMail.jira@brutus> Date: Mon, 29 Oct 2007 19:44:50 -0700 (PDT) From: "Hairong Kuang (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1652) Rebalance data blocks when new data nodes added or data nodes become full In-Reply-To: <24604182.1185389251651.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538648 ] Hairong Kuang commented on HADOOP-1652: --------------------------------------- Currently balancer does not check if the cluster is "finalizedUpgrage" d or not. I can run a test to see what's going on if reblancing is performed when a cluster is upgraded but not finalized yet. > Rebalance data blocks when new data nodes added or data nodes become full > ------------------------------------------------------------------------- > > Key: HADOOP-1652 > URL: https://issues.apache.org/jira/browse/HADOOP-1652 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Affects Versions: 0.13.0 > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Fix For: 0.16.0 > > Attachments: balancer.patch, RebalanceDesign4.pdf, RebalanceDesign5.pdf, RebalanceDesign6.pdf > > > When a new data node joins hdfs cluster, it does not hold much data. So any map task assigned to the machine most likely does not read local data, thus increasing the use of network bandwidth. On the other hand, when some data nodes become full, new data blocks are placed on only non-full data nodes, thus reducing their read parallelism. > This jira aims to find an approach to redistribute data blocks when imbalance occurs in the cluster. An solution should meet the following requirements: > 1. It maintains data availablility guranteens in the sense that rebalancing does not reduce the number of replicas that a block has or the number of racks that the block resides. > 2. An adminstrator should be able to invoke and interrupt rebalancing from a command line. > 3. Rebalancing should be throttled so that rebalancing does not cause a namenode to be too busy to serve any incoming request or saturate the network. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.