hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Question on DFS Balancing
Date Wed, 05 Mar 2014 08:49:48 GMT
you can write a simple tool to move blocks peer to peer. I had such tool
before, but I cannot find it now.

background: our cluster is not balanced, load balancer is very slow, so i
wrote this tool to move blocks from one node to another node.


On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <divs.sheth@gmail.com> wrote:

> I wont be in a position to fix that depending on HDFS-1804 as we are
> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
> have read somewhere that manual movement of the blocks would help. Could
> some one guide me to the exact steps or precautions I should take while
> doing this? Data loss is a NO NO for me.
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
>
>> Hi,
>> That probably break something if you apply the patch from 2.x to 0.20.x,
>> but it depends on.
>>
>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
>> yourself based on HDFS-1804.
>>
>>
>>
>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <divs.sheth@gmail.com> wrote:
>>
>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>> the short term to balance the disk utilization? The patch in the Jira, if
>>> applied to the version that I am using, will it break anything?
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <harsh@cloudera.com> wrote:
>>>
>>>> You're probably looking for
>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>
>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <divs.sheth@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I am new to the mailing list.
>>>> >
>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>> question I
>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>> node has
>>>> > 2 disks attached to it. The second disk was added when the first disk
>>>> was
>>>> > reaching its capacity.
>>>> >
>>>> > Now the scenario that I am facing is, when the new disk was added
>>>> hadoop
>>>> > automatically moved over some data to the new disk. But over the time
>>>> I
>>>> > notice that data is no longer being written to the second disk. I
>>>> have also
>>>> > faced an issue on the datanode where the first disk had 100%
>>>> utilization.
>>>> >
>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>> the disk
>>>> > utilization between multiple disks on single datanode?
>>>> >
>>>> > Thanks
>>>> > Divye Sheth
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Mime
View raw message