Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CCB0EDB00 for ; Tue, 19 Mar 2013 01:27:21 +0000 (UTC) Received: (qmail 4573 invoked by uid 500); 19 Mar 2013 01:27:16 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 4434 invoked by uid 500); 19 Mar 2013 01:27:16 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 4427 invoked by uid 99); 19 Mar 2013 01:27:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Mar 2013 01:27:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tapas.sarangi@gmail.com designates 209.85.223.169 as permitted sender) Received: from [209.85.223.169] (HELO mail-ie0-f169.google.com) (209.85.223.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Mar 2013 01:27:10 +0000 Received: by mail-ie0-f169.google.com with SMTP id 13so7841906iea.14 for ; Mon, 18 Mar 2013 18:26:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:from:content-type:message-id:mime-version:subject:date :references:to:in-reply-to:x-mailer; bh=yv72+fT66pDsTC9ckParQwPAu2OApQVV/JSUUoNq894=; b=FNuiSgwAfV7p/oTBqlKaF+iwHvYDpMirED9/rb24hqQ6zbeY3bJtp4Kv3j6ifpjFs+ UjY9QIRD/P9n0nMZZMUUtVhIy5FAbVwCGAl3Fmy6turMk3tQAHUjelAqBuO4yoocJKCx X25gV31o79BI67T3rNgq99pueCOW21VjD55ZTK0UtF5Hn3C5l/b4Hp6IKi4N1+aIjMgf GPBexRGUd6uTLy599UIJFOXJnh5VCzWnM6tQbRxeKxNaWR1r7GfXQT+TZrvDKM3pY4bf GbGB89wh6YDOgXjfpkyn7MtxOoWi02cjuwKzaeX3HRUkxaSgh3IUKhBhaa96fwgpOP7N 2fLQ== X-Received: by 10.50.150.228 with SMTP id ul4mr49819igb.9.1363656409950; Mon, 18 Mar 2013 18:26:49 -0700 (PDT) Received: from [192.168.11.32] (eagleheights-105-50.resnet.wisc.edu. [146.151.105.50]) by mx.google.com with ESMTPS id ip2sm11905281igc.5.2013.03.18.18.26.48 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 18 Mar 2013 18:26:49 -0700 (PDT) From: Tapas Sarangi Content-Type: multipart/alternative; boundary="Apple-Mail=_20446F56-4F8F-4937-8E0B-6243D06EBCD6" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: disk used percentage is not symmetric on datanodes (balancer) Date: Mon, 18 Mar 2013 20:26:48 -0500 References: <522E52B1-497C-4D8D-9014-0182E8B9AABB@gmail.com> <7ED0F250-9815-4262-BFD9-C743AE30F32E@gmail.com> To: user@hadoop.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_20446F56-4F8F-4937-8E0B-6243D06EBCD6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi, On Mar 18, 2013, at 8:21 PM, =E6=9D=8E=E6=B4=AA=E5=BF=A0 = wrote: > Maybe you need to modify the rackware script to make the rack balance, = ie, all the racks are the same size, on rack by 6 small nodes, one rack = by 1 large nodes.=20 > P.S. > you need to reboot the cluster for rackware script modify. Like I mentioned earlier in my reply to Bertrand, we haven't considered = rack awareness for the cluster, currently it is considered as just one = rack. Can that be the problem ? I don't know=E2=80=A6 -Tapas > =20 > =E4=BA=8E 2013/3/19 7:17, Bertrand Dechoux =E5=86=99=E9=81=93: >> And by active, it means that it does actually stops by itself? Else = it might mean that the throttling/limit might be an issue with regard to = the data volume or velocity. >>=20 >> What threshold is used? >>=20 >> About the small and big datanodes, how are they distributed with = regards to racks? >> About files, how is used the replication factor(s) and block size(s)? >>=20 >> Surely trivial questions again. >>=20 >> Bertrand >>=20 >> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi = wrote: >> Hi, >>=20 >> Sorry about that, had it written, but thought it was obvious.=20 >> Yes, balancer is active and running on the namenode. >>=20 >> -Tapas >>=20 >> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux = wrote: >>=20 >>> Hi, >>>=20 >>> It is not explicitly said but did you use the balancer? >>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer >>>=20 >>> Regards >>>=20 >>> Bertrand >>>=20 >>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi = wrote: >>> Hello, >>>=20 >>> I am using one of the old legacy version (0.20) of hadoop for our = cluster. We have scheduled for an upgrade to the newer version within a = couple of months, but I would like to understand a couple of things = before moving towards the upgrade plan. >>>=20 >>> We have about 200 datanodes and some of them have larger storage = than others. The storage for the datanodes varies between 12 TB to 72 = TB. >>>=20 >>> We found that the disk-used percentage is not symmetric through all = the datanodes. For larger storage nodes the percentage of disk-space = used is much lower than that of other nodes with smaller storage space. = In larger storage nodes the percentage of used disk space varies, but on = average about 30-50%. For the smaller storage nodes this number is as = high as 99.9%. Is this expected ? If so, then we are not using a lot of = the disk space effectively. Is this solved in a future release ? >>>=20 >>> If no, I would like to know if there are any checks/debugs that one = can do to find an improvement with the current version or upgrading = hadoop should solve this problem. >>>=20 >>> I am happy to provide additional information if needed. >>>=20 >>> Thanks for any help. >>>=20 >>> -Tapas >>>=20 >>=20 >>=20 >>=20 >>=20 >> --=20 >> Bertrand Dechoux >=20 --Apple-Mail=_20446F56-4F8F-4937-8E0B-6243D06EBCD6 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 lhztop@hotmail.com> = wrote:
=20 =20
Maybe you need to modify the rackware script to make the rack balance, ie, all the racks are the same size,  on rack by 6 small nodes, one rack by 1 large nodes. =
P.S.
you need to reboot the cluster for rackware script = modify.

Like I mentioned = earlier in my reply to Bertrand, we haven't considered rack awareness = for the cluster, currently it is considered as just one rack. Can that = be the problem ? I don't = know=E2=80=A6

-Tapas


 
=E4=BA=8E 2013/3/19 7:17, Bertrand Dechoux =E5=86=99=E9=81=93:
And by active, it means that it does actually stops = by itself? Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.

What threshold is used?

About the small and big datanodes, how are they distributed with regards to racks?
About files, how is used the replication factor(s) and block size(s)?

Surely trivial questions again.

Bertrand

On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <tapas.sarangi@gmail.com> wrote:
Hi,

Sorry about that, had it written, but thought it was obvious. 
Yes, balancer is active and running on the = namenode.

-Tapas

On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <dechouxb@gmail.com> wrote:

Hi,

It is not explicitly said but did you use the balancer?
http://hadoop.apache.org/docs/r1.0.4/commands_manual.htm= l#balancer

Regards

Bertrand

On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <tapas.sarangi@gmail.com> wrote:
Hello,

I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.

We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.

We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?

If no, I would like to know  if there are = any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this = problem.

I am happy to provide additional information if needed.

Thanks for any help.

-Tapas





--
Bertrand Dechoux


= --Apple-Mail=_20446F56-4F8F-4937-8E0B-6243D06EBCD6--