Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of zorlaxpokemonych@gmail.com
 designates 209.85.128.52 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <C9B76FD4-6FDB-4BA6-BDCA-70D3E5AA2B15@gmail.com>
References: <522E52B1-497C-4D8D-9014-0182E8B9AABB@gmail.com>
	<CAMb3GvimJrv=R+a4qQNOqt8pkitGbnBNTS75K-1T2qhgcgK9mA@mail.gmail.com>
	<D13C7684-7FA4-4912-8CA9-A664FB506099@gmail.com>
	<CAMb3Gvi+soFDc3cQ39pUcOT-=AELwPokg0buJVVDm-frv7B8KQ@mail.gmail.com>
	<C9B76FD4-6FDB-4BA6-BDCA-70D3E5AA2B15@gmail.com>
Date: Fri, 22 Mar 2013 20:05:34 +0400
Message-ID: 
 <CAMb3GviC1aoZpaex_TC8hv3nf_gFoGj23AuMysEiwyg-JJMpGQ@mail.gmail.com>
Subject: Re: disk used percentage is not symmetric on datanodes (balancer)
From: =?UTF-8?B?0JDQu9C10LrRgdC10Lkg0JHQsNCx0YPRgtC40L0=?=
 <zorlaxpokemonych@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=485b397dd6970e759004d885a093

--485b397dd6970e759004d885a093
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

2013/3/20 Tapas Sarangi <tapas.sarangi@gmail.com>

> Thanks for your reply. Some follow up questions below :
>
> On Mar 20, 2013, at 5:35 AM, =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B5=D0=B9 =
=D0=91=D0=B0=D0=B1=D1=83=D1=82=D0=B8=D0=BD <zorlaxpokemonych@gmail.com>
> wrote:
>
>
>
> dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help
> you,because it makes all the nodes equal.They can differ only on balancer
> threshold.Threshold =3D10 by default.It means,that nodes can differ up to
> 350Tb between each other in 3.5Pb cluster.If Threshold =3D1 up to 35Tb an=
d so
> on.
>
>
> If we use multiple racks, let's assume we have 10 racks now and they are
> equally divided in size (350 TB each). With a default threshold of 10, an=
y
> two nodes on a given rack will have a maximum difference of 35 TB, is thi=
s
> correct ? Also, does this mean the difference between any two racks will
> also go down to 35 TB ?
>

Balancer know about topology,but when calculate balancing it operates only
with nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string
509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3D3.5Pb
cluster_dfsused=3D2Pb

avgutil=3Dcluster_dfsused/cluster_capacity*100=3D57.14% used cluster capaci=
ty
Then we know avg node utilization (node_dfsused/node_capacity*100)
.Balancer think that all good if  avgutil
+10>node_utilizazation>=3Davgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only
6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=3DLIVE i=
f
you can.


>
>
> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
> will be able to have only 12Tb replication data.
>
>
> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
> TB, but not true for more than two nodes in the cluster.
>
>
> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
> with identical capacity.Racks must be identical capacity.
> For example:
>
> rack1: 1 node with 72Tb
> rack2: 6 nodes with 12Tb
> rack3: 3 nodes with 24Tb
>
> It helps with balancing,because dublicated  block must be another rack.
>
>
> The same question I asked earlier in this message, does multiple racks
> with default threshold for the balancer minimizes the difference between
> racks ?
>
> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>
>
> It wasn't my decision, and I probably can't change it now. I am new to
> this cluster and trying to understand few issues. I will explore other
> options as you mentioned.
>
>
>

--485b397dd6970e759004d885a093
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">2013/3/20 Tapas Sarangi <span dir=3D"ltr=
">&lt;<a href=3D"mailto:tapas.sarangi@gmail.com" target=3D"_blank">tapas.sa=
rangi@gmail.com</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word">Thanks for your reply. Some follow up q=
uestions below :<div><br><div><div><div class=3D"im"><div>On Mar 20, 2013, =
at 5:35 AM, =D0=90=D0=BB=D0=B5=D0=BA=D1=81=D0=B5=D0=B9 =D0=91=D0=B0=D0=B1=
=D1=83=D1=82=D0=B8=D0=BD &lt;<a href=3D"mailto:zorlaxpokemonych@gmail.com" =
target=3D"_blank">zorlaxpokemonych@gmail.com</a>&gt; wrote:</div>
<blockquote type=3D"cite"><div class=3D"gmail_quote"><div><br>=C2=A0</div><=
div><span style=3D"line-height:13px;text-align:left;color:rgb(51,51,51);fon=
t-size:13px;font-family:monospace">dfs.balance.bandwidthPerSec in </span>hd=
fs-site.xml.I
 think balancer cant help you,because it makes all the nodes equal.They=20
can differ only on balancer threshold.Threshold =3D10 by default.It=20
means,that nodes can differ up to 350Tb between each other in 3.5Pb=20
cluster.If Threshold =3D1 up to 35Tb and so on.<br></div></div></blockquote=
><div><br></div></div><div>If we use multiple racks, let&#39;s assume we ha=
ve 10 racks now and they are equally divided in size (350 TB each). With a =
default threshold of 10, any two nodes on a given rack will have a maximum =
difference of 35 TB, is this correct ? Also, does this mean the difference =
between any two racks will also go down to 35 TB ?</div>
</div></div></div></div></blockquote><div><br>Balancer know about topology,=
but when calculate balancing it operates only with nodes not with racks.<br=
>You can see how it work in Balancer.java in=C2=A0 BalancerDatanode about s=
tring 509.<br>
<br>I was wrong about 350Tb,35Tb it calculates in such way :<br><br>For exa=
mple:<br>cluster_capacity=3D3.5Pb<br>cluster_dfsused=3D2Pb<br><br>avgutil=
=3Dcluster_dfsused/cluster_capacity*100=3D57.14% used cluster capacity<br>T=
hen we know avg node utilization (node_dfsused/node_capacity*100) .Balancer=
 think that all good if=C2=A0 avgutil +10&gt;node_utilizazation&gt;=3Davgut=
il-10.<br>
<br>Ideal case that all node used avgutl of capacity.but for 12TB node its =
only 6.5Tb and for 72Tb its about 40Tb.<br><br>Balancer cant help you.<br><=
br>Show me <a href=3D"http://namenode.rambler.ru:50070/dfsnodelist.jsp?what=
Nodes=3DLIVE">http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=3D=
LIVE</a> if you can.<br>
<br>=C2=A0<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D"word-wrap:=
break-word"><div><div><div><div class=3D"im"><div><br></div><div><br></div>=
<blockquote type=3D"cite">
<div class=3D"gmail_quote"><div>
In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you w=
ill be able to have only 12Tb replication data.<br></div></div></blockquote=
><div><br></div></div><div>Yes, this is true for exactly two nodes in the c=
luster with 12 TB and 72 TB, but not true for more than two nodes in the cl=
uster.</div>
<div class=3D"im"><br><blockquote type=3D"cite"><div class=3D"gmail_quote">=
<div><br>Best
 way,on my opinion,it is using multiple racks.Nodes in rack must be with
 identical capacity.Racks must be identical capacity.<br>
For example:<br><br>rack1: 1 node with 72Tb<br>rack2: 6 nodes with 12Tb<br>=
rack3: 3 nodes with 24Tb<br><br>It helps with balancing,because dublicated=
=C2=A0 block must be another rack.<br><br></div></div></blockquote><div><br=
>
</div></div><div>The same question I asked earlier in this message, does mu=
ltiple racks with default threshold for the balancer minimizes the differen=
ce between racks ?</div><div class=3D"im"><br><blockquote type=3D"cite"><di=
v class=3D"gmail_quote">
<div>Why did you select hdfs?May be lustre,cephfs and other is better chois=
e.=C2=A0 <br>
</div></div></blockquote><div><br></div></div><div>It wasn&#39;t my decisio=
n, and I probably can&#39;t change it now. I am new to this cluster and try=
ing to understand few issues. I will explore other options as you mentioned=
.</div>
<div><br></div><div><br></div></div></div></div></div></blockquote></div><b=
r>

--485b397dd6970e759004d885a093--