Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5ACF26CA for ; Thu, 5 May 2011 12:57:56 +0000 (UTC) Received: (qmail 14625 invoked by uid 500); 5 May 2011 12:57:53 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 14588 invoked by uid 500); 5 May 2011 12:57:53 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 14580 invoked by uid 99); 5 May 2011 12:57:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 12:57:53 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mathias.herberts@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 12:57:47 +0000 Received: by qyk30 with SMTP id 30so1950990qyk.14 for ; Thu, 05 May 2011 05:57:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=pBiBQM6oQJzspqD+h/qOD10dLUWk3FBqboG6dljaE2I=; b=CsFLwmwh9WROfP3TylA4yucyZXbkXkg+T1mrfl3XXrYLYVhQkig5inucOjc1DS5qWy xc8+9Q7CHvoz4o+i0yCBykSdllZYBJ7TqXy/e7pFUO3n1YrIlpeOxx2TRU5L2L4Ptm6m lUmiQvaeHvl/vkMJLTqF2iJAwoy11Vwdbh07c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=DOsfEcuv/6PN8x+eChIhSWxJ/mJor7NqXYDgfhrRWreOqIPIUecYcPaFN4YELVmFc5 kghxpT/zUmllZe9j4onbI3jY+jJD37exRNIauVkdGFpALVTgJjqoUPm9PgVRtV6HO1fR +JxsX4wL7W8aJZCtbPUzyZAboZsJBrMT0Uy/4= MIME-Version: 1.0 Received: by 10.224.210.68 with SMTP id gj4mr2232138qab.370.1304600246486; Thu, 05 May 2011 05:57:26 -0700 (PDT) Received: by 10.229.241.17 with HTTP; Thu, 5 May 2011 05:57:26 -0700 (PDT) In-Reply-To: <4DC2987C.1010609@kalooga.com> References: <4DC2987C.1010609@kalooga.com> Date: Thu, 5 May 2011 14:57:26 +0200 Message-ID: Subject: Re: distcp performing much better for rebalancing than dedicated balancer From: Mathias Herberts To: common-user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Did you explicitely start a balancer or did you decommission the nodes using dfs.hosts.exclude and a dfsadmin -refreshNodes? On Thu, May 5, 2011 at 14:30, Ferdy Galema wrote= : > Hi, > > On our 15node cluster (1GB ethernet and 4x1TB disk per node) I noticed th= at > distcp does a much better job at rebalancing than the dedicated balancer > does. We needed to decommision 11 nodes, so that prior to rebalancing we = had > 4 used and 11 empty nodes. The 4 used nodes had about 25% usage each. Mos= t > of our files are of average size: We have about 500K files in 280K blocks > and 800K blocks total (blocksize is 64MB). > > So I changed dfs.balance.bandwidthPerSec to 800100100 and restarted the > cluster. Started the balancer tool and I noticed that the it moved about > 200GB in 1 hour. (I grepped the balancer log for "Need to move"). > > After stopping the balancer I started a distcp. =C2=A0This tool copied 90= 0GB in > just 45 minutes, with an average replication of 2 so it's total throughpu= t > was around 2.4 TB/hour. Fair enough, it is not purely rebalancing because > the 4 overused nodes also get new blocks, still it performs much better. > Munin confirms the much higher disk/ethernet throughputs of the distcp. > > Are these characteristics to be expected? Either way, can the balancer be > boosted even more? (Aside the dfs.balance.bandwidthPerSec property). > > Ferdy. >