Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 96505 invoked from network); 12 May 2008 17:51:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 May 2008 17:51:03 -0000 Received: (qmail 13628 invoked by uid 500); 12 May 2008 17:51:02 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 13592 invoked by uid 500); 12 May 2008 17:51:02 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 13581 invoked by uid 99); 12 May 2008 17:51:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2008 10:51:02 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2008 17:50:14 +0000 Received: from SNV-EXBH01.ds.corp.yahoo.com (snv-exbh01.ds.corp.yahoo.com [207.126.227.249]) by mrout3.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id m4CHoHSV031886 for ; Mon, 12 May 2008 10:50:17 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:return-path:x-originalarrivaltime; b=C916DN+Ub0vCsMivWLAj5zEWWfrLrun+ceMRMmACMfILOprinT6mJUCKfKXEPGJl Received: from SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.86]) by SNV-EXBH01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 12 May 2008 10:50:17 -0700 Received: from 10.72.112.100 ([10.72.112.100]) by SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.84]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.60]) with Microsoft Exchange Server HTTP-DAV ; Mon, 12 May 2008 17:50:15 +0000 User-Agent: Microsoft-Entourage/11.3.6.070618 Date: Mon, 12 May 2008 10:53:16 -0700 Subject: Re: Balancer not balancing 100%? From: Hairong Kuang To: Message-ID: Thread-Topic: Balancer not balancing 100%? Thread-Index: Aci0Vd/MHkHd4CBJEd2TiwAWy8rVfQAAy4yR In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 12 May 2008 17:50:17.0238 (UTC) FILETIME=[A372A360:01C8B458] X-Virus-Checked: Checked by ClamAV on apache.org Please check the balancer user guide at http://issues.apache.org/jira/secure/attachment/12370966/BalancerUserGuide2. pdf. As stated in the document, a cluster is balanced iff |utilization(DNi)-average utilization| wrote: > > I think the balancer has a pretty lenient feeling about what "balanced" > means. > > If you want to shave off the last slivers, try the trick of increasing > replication on each file, one at a time and then decreasing it after 30-60 > seconds. You can do this at whatever rate your disk space limits you to > (i.e. If your disk is 80% full, you can double the replication on 1/4 of > your files without running out of disk). > > > On 5/11/08 11:48 AM, "Otis Gospodnetic" wrote: > >> Oh, and on top of the above, I just observed that even though bin/hadoop >> balancer exits immediately and reports the cluster is fully balanced, I do >> see >> *very* few blocks (1-2 blocks per node) getting moved every time I run >> balancer. It feels as if the balancer does actually find some blocks that it >> could move around, moves them, but then quickly gets lazy and just exits >> claiming the cluster is/was already balanced. I just ran balancer about 10 >> times and each time it moved a couple of blocks and then exited. >> >> Makes me want to do ugly stuff like: >> for ((i=1; i <= 9999; i++)); do echo $i; bin/hadoop balancer; done >> >> >> ...just to get to the point where all 4 nodes have the same number of blocks >> and thus the same percentage of disk used... >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> ----- Original Message ---- >>> From: Otis Gospodnetic >>> To: core-user@hadoop.apache.org >>> Sent: Sunday, May 11, 2008 2:36:24 PM >>> Subject: Balancer not balancing 100%? >>> >>> Hi, >>> >>> I have 4 identical nodes in a Hadoop cluster (all functioning as DNs). One >>> of >>> the 4 nodes is a new node that I recently added. I ran the balancer a few >>> times >>> and it did move some of the blocks from the other 3 nodes to the new node. >>> However, the 4 nodes are still not 100% balanced (according to the GUI), >>> even >>> though running bin/hadoop balancer says the cluster is balanced: >>> >>> Time Stamp Iteration# Bytes Already Moved Bytes Left To Move >>> Bytes Being Moved >>> The cluster is balanced. Exiting... >>> Balancing took 666.0 milliseconds >>> >>> >>> The 3 old DNs are about 60% full (around 24K blocks), which the 1 new DN is >>> only >>> about 50% full (around 21K blocks). I restarted the NN and re-ran the >>> balancer, >>> bug got the same output: "The cluster is balanced. Exiting..." >>> >>> Is this a bug or is it somehow possible for a cluster to be balanced, yet >>> have >>> nodes with different number of blocks? >>> >>> Thanks, >>> Otis >> >