Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of azuryyyu@gmail.com designates
 209.85.216.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAHpih6ydWa2sxHXVUm8qFYkKbw4Obe68B28_Pz8GtcWhE17LSg@mail.gmail.com>
References: 
 <CAHpih6wf3wyY-MqNB6W0kLg3LGWePvj7vffxRSMvz_fzfWsaVw@mail.gmail.com>
	<CAOcnVr1W6EkuPhi3uCrOyj9d7vQLFxthvs5UYTG+=PmMfacnKA@mail.gmail.com>
	<CAHpih6ydWa2sxHXVUm8qFYkKbw4Obe68B28_Pz8GtcWhE17LSg@mail.gmail.com>
Date: Wed, 5 Mar 2014 15:58:58 +0800
Message-ID: 
 <CALr1C9pZJ_-uM3sRurAm7ZA0q4M-roYHU_mtgbLH-EBSPAEWsw@mail.gmail.com>
Subject: Re: Question on DFS Balancing
From: Azuryy Yu <azuryyyu@gmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=001a11c136ca9e107004f3d7649f

--001a11c136ca9e107004f3d7649f
Content-Type: text/plain; charset=ISO-8859-1

Hi,
That probably break something if you apply the patch from 2.x to 0.20.x,
but it depends on.

AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
yourself based on HDFS-1804.


On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <divs.sheth@gmail.com> wrote:

> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <harsh@cloudera.com> wrote:
>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <divs.sheth@gmail.com> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> has
>> > 2 disks attached to it. The second disk was added when the first disk
>> was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> also
>> > faced an issue on the datanode where the first disk had 100%
>> utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>>
>
>

--001a11c136ca9e107004f3d7649f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,=A0<div>That probably break something if you apply the =
patch from 2.x to 0.20.x, but it depends on.</div><div><br></div><div>AFAIK=
, Balancer had a major refactor in HDFSv2, so you&#39;d better fix it by yo=
urself based on HDFS-1804.</div>
<div><br></div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail=
_quote">On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <span dir=3D"ltr">&lt;<=
a href=3D"mailto:divs.sheth@gmail.com" target=3D"_blank">divs.sheth@gmail.c=
om</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Thanks Harsh. The jira is f=
ixed in version 2.1.0 whereas I am using Hadoop 0.20.2 (we are in a process=
 of upgrading) is there a workaround for the short term to balance the disk=
 utilization? The patch in the Jira, if applied to the version that I am us=
ing, will it break anything?<div>

<br></div><div>Thanks</div><span class=3D"HOEnZb"><font color=3D"#888888"><=
div>Divye Sheth</div></font></span></div><div class=3D"HOEnZb"><div class=
=3D"h5"><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On We=
d, Mar 5, 2014 at 11:28 AM, Harsh J <span dir=3D"ltr">&lt;<a href=3D"mailto=
:harsh@cloudera.com" target=3D"_blank">harsh@cloudera.com</a>&gt;</span> wr=
ote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">You&#39;re probably looking for <a href=3D"h=
ttps://issues.apache.org/jira/browse/HDFS-1804" target=3D"_blank">https://i=
ssues.apache.org/jira/browse/HDFS-1804</a><br>


<div><div><br>
On Tue, Mar 4, 2014 at 5:54 AM, divye sheth &lt;<a href=3D"mailto:divs.shet=
h@gmail.com" target=3D"_blank">divs.sheth@gmail.com</a>&gt; wrote:<br>
&gt; Hi,<br>
&gt;<br>
&gt; I am new to the mailing list.<br>
&gt;<br>
&gt; I am using Hadoop 0.20.2 with an append r1056497 version. The question=
 I<br>
&gt; have is related to balancing. I have a 5 datanode cluster and each nod=
e has<br>
&gt; 2 disks attached to it. The second disk was added when the first disk =
was<br>
&gt; reaching its capacity.<br>
&gt;<br>
&gt; Now the scenario that I am facing is, when the new disk was added hado=
op<br>
&gt; automatically moved over some data to the new disk. But over the time =
I<br>
&gt; notice that data is no longer being written to the second disk. I have=
 also<br>
&gt; faced an issue on the datanode where the first disk had 100% utilizati=
on.<br>
&gt;<br>
&gt; How can I overcome such scenario, is it not hadoop&#39;s job to balanc=
e the disk<br>
&gt; utilization between multiple disks on single datanode?<br>
&gt;<br>
&gt; Thanks<br>
&gt; Divye Sheth<br>
<br>
<br>
<br>
</div></div><span><font color=3D"#888888">--<br>
Harsh J<br>
</font></span></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a11c136ca9e107004f3d7649f--