Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of tapas.sarangi@gmail.com
 designates 209.85.210.170 as permitted sender)
From: Tapas Sarangi <tapas.sarangi@gmail.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_F80E51A3-AC75-4651-8492-4A9F5808D284"
Message-Id: <2068CE03-68B2-4AE6-9CD8-F590DD57C7E3@gmail.com>
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: disk used percentage is not symmetric on datanodes (balancer)
Date: Sun, 24 Mar 2013 13:32:28 -0500
References: <522E52B1-497C-4D8D-9014-0182E8B9AABB@gmail.com>
 <CAMb3GvimJrv=R+a4qQNOqt8pkitGbnBNTS75K-1T2qhgcgK9mA@mail.gmail.com>
 <D13C7684-7FA4-4912-8CA9-A664FB506099@gmail.com>
 <CAMb3Gvi+soFDc3cQ39pUcOT-=AELwPokg0buJVVDm-frv7B8KQ@mail.gmail.com>
 <C9B76FD4-6FDB-4BA6-BDCA-70D3E5AA2B15@gmail.com>
 <CAMb3GviC1aoZpaex_TC8hv3nf_gFoGj23AuMysEiwyg-JJMpGQ@mail.gmail.com>
 <19B0FB3B-40CF-435F-A120-3B4FBA83A9AF@gmail.com>
 <CACvhJWdifPdDT6s9mVhkNMq_9_aV0pLyPELZUVxWhjYQ3OK0+g@mail.gmail.com>
To: user@hadoop.apache.org
In-Reply-To: 
 <CACvhJWdifPdDT6s9mVhkNMq_9_aV0pLyPELZUVxWhjYQ3OK0+g@mail.gmail.com>


--Apple-Mail=_F80E51A3-AC75-4651-8492-4A9F5808D284
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Yes, we are running balancer, though a balancer process runs for almost =
a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume =
that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If =
it is in Bits then we have a problem.
What's the unit for "dfs.balance.bandwidthPerSec" ?

-----

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (=E0=AE=AA=E0=AE=BE=E0=AE=B2=
=E0=AE=BE=E0=AE=9C=E0=AE=BF =E0=AE=A8=E0=AE=BE=E0=AE=B0=E0=AE=BE=E0=AE=AF=E0=
=AE=A3=E0=AE=A9=E0=AF=8D) <lists@balajin.net> wrote:

> Are you running balancer? If balancer is running and if it is slow, =
try increasing the balancer bandwidth
>=20
>=20
> On 24 March 2013 09:21, Tapas Sarangi <tapas.sarangi@gmail.com> wrote:
> Thanks for the follow up. I don't know whether attachment will pass =
through this mailing list, but I am attaching a pdf that contains the =
usage of all live nodes.
>=20
> All nodes starting with letter "g" are the ones with smaller storage =
space where as nodes starting with letter "s" have larger storage space. =
As you will see, most of the "gXX" nodes are completely full whereas =
"sXX" nodes have a lot of unused space.=20
>=20
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode =
where it is not able to write any further even though the total space =
available in the cluster is about 500 TB. We believe this has something =
to do with the way it is balancing the nodes, but don't understand the =
problem yet. May be the attached PDF will help some of you (experts) to =
see what is going wrong here...
>=20
> Thanks
> ------
>=20
>=20
>=20
>=20
>=20
>=20
>>=20
>> Balancer know about topology,but when calculate balancing it operates =
only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about =
string 509.
>>=20
>> I was wrong about 350Tb,35Tb it calculates in such way :
>>=20
>> For example:
>> cluster_capacity=3D3.5Pb
>> cluster_dfsused=3D2Pb
>>=20
>> avgutil=3Dcluster_dfsused/cluster_capacity*100=3D57.14% used cluster =
capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100) =
.Balancer think that all good if  avgutil =
+10>node_utilizazation>=3Davgutil-10.
>>=20
>> Ideal case that all node used avgutl of capacity.but for 12TB node =
its only 6.5Tb and for 72Tb its about 40Tb.
>>=20
>> Balancer cant help you.
>>=20
>> Show me =
http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=3DLIVE if you =
can.
>>=20
>> =20
>>=20
>>=20
>>> In ideal case with replication factor 2 ,with two nodes 12Tb and =
72Tb you will be able to have only 12Tb replication data.
>>=20
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and =
72 TB, but not true for more than two nodes in the cluster.
>>=20
>>>=20
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must =
be with identical capacity.Racks must be identical capacity.
>>> For example:
>>>=20
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>>=20
>>> It helps with balancing,because dublicated  block must be another =
rack.
>>>=20
>>=20
>> The same question I asked earlier in this message, does multiple =
racks with default threshold for the balancer minimizes the difference =
between racks ?
>>=20
>>> Why did you select hdfs?May be lustre,cephfs and other is better =
choise. =20
>>=20
>> It wasn't my decision, and I probably can't change it now. I am new =
to this cluster and trying to understand few issues. I will explore =
other options as you mentioned.
>>=20
>> --=20
>> http://balajin.net/blog
>> http://flic.kr/balajijegan


--Apple-Mail=_F80E51A3-AC75-4651-8492-4A9F5808D284
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><div>Yes, we are running balancer, though a balancer process runs for =
almost a day or more before exiting and starting =
over.</div><div>Current&nbsp;<span style=3D"font-family: 'Times New =
Roman'; font-size: large; =
">dfs.balance.bandwidthPerSec</span>&nbsp;value is set to 2x10^9. I =
assume that's bytes so about 2 GigaByte/sec. Shouldn't that be =
reasonable ?&nbsp;If it is in Bits then we have a =
problem.</div><div>What's the unit for "<span style=3D"font-family: =
'Times New Roman'; font-size: large; =
">dfs.balance.bandwidthPerSec</span>" =
?</div><div><br></div><div>-----</div><div><br></div><div><div>On Mar =
24, 2013, at 1:23 PM, Balaji Narayanan (=E0=AE=AA=E0=AE=BE=E0=AE=B2=E0=AE=BE=
=E0=AE=9C=E0=AE=BF =E0=AE=A8=E0=AE=BE=E0=AE=B0=E0=AE=BE=E0=AE=AF=E0=AE=A3=E0=
=AE=A9=E0=AF=8D) &lt;<a =
href=3D"mailto:lists@balajin.net">lists@balajin.net</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><div dir=3D"ltr">Are you running balancer? If balancer is =
running and if it is slow, try increasing the balancer =
bandwidth<br></div><div class=3D"gmail_extra"><br><br><div =
class=3D"gmail_quote">On 24 March 2013 09:21, Tapas Sarangi <span =
dir=3D"ltr">&lt;<a href=3D"mailto:tapas.sarangi@gmail.com" =
target=3D"_blank">tapas.sarangi@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word"><div>Thanks for the follow up. I don't =
know whether attachment will pass through this mailing list, but I am =
attaching a pdf that contains the usage of all live nodes.</div>
<div><br></div><div>All nodes starting with letter "g" are the ones with =
smaller storage space where as nodes starting with letter "s" have =
larger storage space. As you will see, most of the "gXX" nodes are =
completely full whereas "sXX" nodes have a lot of unused =
space.&nbsp;</div>
<div><br></div><div>Recently, we are facing crisis frequently as 'hdfs' =
goes into a mode where it is not able to write any further even though =
the total space available in the cluster is about 500 TB. We believe =
this has something to do with the way it is balancing the nodes, but =
don't understand the problem yet. May be the attached PDF will help some =
of you (experts) to see what is going wrong here...</div>
=
<div><br></div><div>Thanks</div><div>------</div><div><br></div><div><br><=
/div><div></div></div><br><div =
style=3D"word-wrap:break-word"><div></div><div><br></div><div><br></div><d=
iv><br><blockquote type=3D"cite"><div class=3D"gmail_quote">
<div><br>Balancer know about topology,but when calculate balancing it =
operates only with nodes not with racks.<br>You can see how it work in =
Balancer.java in&nbsp; BalancerDatanode about string 509.<br>
<br>I was wrong about 350Tb,35Tb it calculates in such way :<br><br>For =
example:<br>cluster_capacity=3D3.5Pb<br>cluster_dfsused=3D2Pb<br><br>avgut=
il=3Dcluster_dfsused/cluster_capacity*100=3D57.14% used cluster =
capacity<br>Then we know avg node utilization =
(node_dfsused/node_capacity*100) .Balancer think that all good if&nbsp; =
avgutil +10&gt;node_utilizazation&gt;=3Davgutil-10.<br>

<br>Ideal case that all node used avgutl of capacity.but for 12TB node =
its only 6.5Tb and for 72Tb its about 40Tb.<br><br>Balancer cant help =
you.<br><br>Show me <a =
href=3D"http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=3DLIVE"=
 =
target=3D"_blank">http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNod=
es=3DLIVE</a> if you can.<br>

<br>&nbsp;<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"word-wrap:break-word"><div><div><br></div><div><br></div><blockqu=
ote type=3D"cite">
<div class=3D"gmail_quote">
In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb =
you will be able to have only 12Tb replication =
data.<br></div></blockquote><div><br></div></div><div>Yes, this is true =
for exactly two nodes in the cluster with 12 TB and 72 TB, but not true =
for more than two nodes in the cluster.</div>

<div><br><blockquote type=3D"cite"><div class=3D"gmail_quote"><br>Best
 way,on my opinion,it is using multiple racks.Nodes in rack must be with
 identical capacity.Racks must be identical capacity.<br>
For example:<br><br>rack1: 1 node with 72Tb<br>rack2: 6 nodes with =
12Tb<br>rack3: 3 nodes with 24Tb<br><br>It helps with balancing,because =
dublicated&nbsp; block must be another =
rack.<br><br></div></blockquote><div><br>
</div></div><div>The same question I asked earlier in this message, does =
multiple racks with default threshold for the balancer minimizes the =
difference between racks ?</div><div><br><blockquote type=3D"cite"><div =
class=3D"gmail_quote">

<div>Why did you select hdfs?May be lustre,cephfs and other is better =
choise.&nbsp; <br>
</div></div></blockquote><div><br></div></div><div>It wasn't my =
decision, and I probably can't change it now. I am new to this cluster =
and trying to understand few issues. I will explore other options as you =
mentioned.<br clear=3D"all">
<br>-- <br><a =
href=3D"http://balajin.net/blog">http://balajin.net/blog</a><br><a =
href=3D"http://flic.kr/balajijegan">http://flic.kr/balajijegan</a>

=
</div></div></blockquote></div></blockquote></div></div></blockquote></div=
></div>
</blockquote></div><br></body></html>=

--Apple-Mail=_F80E51A3-AC75-4651-8492-4A9F5808D284--