Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4C080E53B for ; Tue, 19 Mar 2013 14:56:21 +0000 (UTC) Received: (qmail 42941 invoked by uid 500); 19 Mar 2013 14:56:15 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 42833 invoked by uid 500); 19 Mar 2013 14:56:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 42826 invoked by uid 99); 19 Mar 2013 14:56:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Mar 2013 14:56:15 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tapas.sarangi@gmail.com designates 209.85.210.175 as permitted sender) Received: from [209.85.210.175] (HELO mail-ia0-f175.google.com) (209.85.210.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Mar 2013 14:56:10 +0000 Received: by mail-ia0-f175.google.com with SMTP id y26so475512iab.20 for ; Tue, 19 Mar 2013 07:55:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=hxvLJT8sMD757AGr+4wBkEYIcUsYSgv+BtjbEW1u30U=; b=rE+O84OcbyAiXuy+TK7aFZUR6WXcNCJgr24VM14BC4gkLVqxTOikwbQ7JPWYtjv5ZY PwaMsKAiy3Px9wtfV1SwKRDq825RdLS2IkrEXMf1w1pGKTB7xEFZzFrN/6jgZgoGarEl O+rs2ZlHbDQEGku28wKsUuurfmb0GtixRvHv2E/6C4cVEK3/nHIgATIXeF2JGg0hbbTk IyGS/RJQRzerixc3G+zNUWjp7d2m5gEAXZcgmHoO0Xh3LjhwHRPIvtdRR3rbAUVUhnCy pvLiEtJrWtUnu2wPdgSXC85Mc2BYqS+UnKfgu6yQ+pNdYqi6+uixjflfNuPxvq6tFCo9 4mEQ== X-Received: by 10.50.13.175 with SMTP id i15mr688330igc.105.1363704949820; Tue, 19 Mar 2013 07:55:49 -0700 (PDT) Received: from dhcp-34.hep.wisc.edu (dhcp-34.hep.wisc.edu. [128.104.29.223]) by mx.google.com with ESMTPS id gy3sm793670igc.10.2013.03.19.07.55.47 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 19 Mar 2013 07:55:48 -0700 (PDT) Content-Type: text/plain; charset=GB2312 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: disk used percentage is not symmetric on datanodes (balancer) From: Tapas Sarangi In-Reply-To: Date: Tue, 19 Mar 2013 09:55:46 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <7387AA18-03E1-4234-BAE7-039083228241@gmail.com> References: <522E52B1-497C-4D8D-9014-0182E8B9AABB@gmail.com> <7ED0F250-9815-4262-BFD9-C743AE30F32E@gmail.com> To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org On Mar 18, 2013, at 11:50 PM, Harsh J wrote: > What do you mean that the balancer is always active? meaning, the same process is active for a long time. The process that = starts may not be exiting at all. We have a cron job set to run it every = 10 minutes, but that's not in effect because the process may never exit. > It is to be used > as a tool and it exits once it balances in a specific run (loops until > it does, but always exits at end). The balancer does balance based on > usage percentage so that is what you're probably looking for/missing. >=20 May be. How does the balancer look for the usage percentage ? -Tapas > On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi = wrote: >> Hi, >>=20 >> On Mar 18, 2013, at 8:21 PM, =C0=EE=BA=E9=D6=D2 = wrote: >>=20 >> Maybe you need to modify the rackware script to make the rack = balance, ie, >> all the racks are the same size, on rack by 6 small nodes, one rack = by 1 >> large nodes. >> P.S. >> you need to reboot the cluster for rackware script modify. >>=20 >>=20 >> Like I mentioned earlier in my reply to Bertrand, we haven't = considered rack >> awareness for the cluster, currently it is considered as just one = rack. Can >> that be the problem ? I don't know=A1=AD >>=20 >> -Tapas >>=20 >>=20 >>=20 >> =D3=DA 2013/3/19 7:17, Bertrand Dechoux =D0=B4=B5=C0: >>=20 >> And by active, it means that it does actually stops by itself? Else = it might >> mean that the throttling/limit might be an issue with regard to the = data >> volume or velocity. >>=20 >> What threshold is used? >>=20 >> About the small and big datanodes, how are they distributed with = regards to >> racks? >> About files, how is used the replication factor(s) and block size(s)? >>=20 >> Surely trivial questions again. >>=20 >> Bertrand >>=20 >> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi = >> wrote: >>>=20 >>> Hi, >>>=20 >>> Sorry about that, had it written, but thought it was obvious. >>> Yes, balancer is active and running on the namenode. >>>=20 >>> -Tapas >>>=20 >>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux = wrote: >>>=20 >>> Hi, >>>=20 >>> It is not explicitly said but did you use the balancer? >>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer >>>=20 >>> Regards >>>=20 >>> Bertrand >>>=20 >>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi = >>> wrote: >>>>=20 >>>> Hello, >>>>=20 >>>> I am using one of the old legacy version (0.20) of hadoop for our >>>> cluster. We have scheduled for an upgrade to the newer version = within a >>>> couple of months, but I would like to understand a couple of things = before >>>> moving towards the upgrade plan. >>>>=20 >>>> We have about 200 datanodes and some of them have larger storage = than >>>> others. The storage for the datanodes varies between 12 TB to 72 = TB. >>>>=20 >>>> We found that the disk-used percentage is not symmetric through all = the >>>> datanodes. For larger storage nodes the percentage of disk-space = used is >>>> much lower than that of other nodes with smaller storage space. In = larger >>>> storage nodes the percentage of used disk space varies, but on = average about >>>> 30-50%. For the smaller storage nodes this number is as high as = 99.9%. Is >>>> this expected ? If so, then we are not using a lot of the disk = space >>>> effectively. Is this solved in a future release ? >>>>=20 >>>> If no, I would like to know if there are any checks/debugs that = one can >>>> do to find an improvement with the current version or upgrading = hadoop >>>> should solve this problem. >>>>=20 >>>> I am happy to provide additional information if needed. >>>>=20 >>>> Thanks for any help. >>>>=20 >>>> -Tapas >>>>=20 >>>=20 >>=20 >>=20 >>=20 >> -- >> Bertrand Dechoux >>=20 >>=20 >>=20 >=20 >=20 >=20 > --=20 > Harsh J