hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamal B <jm151...@gmail.com>
Subject Re: disk used percentage is not symmetric on datanodes (balancer)
Date Mon, 25 Mar 2013 02:09:22 GMT
Yes
On Mar 24, 2013 9:25 PM, "Tapas Sarangi" <tapas.sarangi@gmail.com> wrote:

> Thanks. Does this need a restart of hadoop in the nodes where this
> modification is made ?
>
> -----
>
> On Mar 24, 2013, at 8:06 PM, Jamal B <jm15119b@gmail.com> wrote:
>
> dfs.datanode.du.reserved
>
> You could tweak that param on the smaller nodes to "force" the flow of
> blocks to other nodes.   A short term hack at best, but should help the
> situation a bit.
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <tapas.sarangi@gmail.com> wrote:
>
>>
>> On Mar 24, 2013, at 4:34 PM, Jamal B <jm15119b@gmail.com> wrote:
>>
>> It shouldn't cause further problems since most of your small nodes are
>> already their capacity.  You could set or increase the dfs reserved
>> property on your smaller nodes to force the flow of blocks onto the larger
>> nodes.
>>
>>
>> Thanks.  Can you please specify which are the dfs properties that we can
>> set or modify to force the flow of blocks directed towards the larger nodes
>> than the smaller nodes ?
>>
>> -----
>>
>>
>>
>>
>>
>>
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <tapas.sarangi@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the idea, I will give this a try and report back.
>>>
>>> My worry is if we decommission a small node (one at a time), will it
>>> move the data to larger nodes or choke another smaller nodes ? In principle
>>> it should distribute the blocks, the point is it is not distributing the
>>> way we expect it to, so do you think this may cause further problems ?
>>>
>>> ---------
>>>
>>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm15119b@gmail.com> wrote:
>>>
>>> Then I think the only way around this would be to decommission  1 at a
>>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>>> nodes.
>>>
>>> And once complete, bring back in the smaller nodes, but maybe only after
>>> you tweak the rack topology to match your disk layout more than network
>>> layout to compensate for the unbalanced nodes.
>>>
>>>
>>> Just my 2 cents
>>>
>>>
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <tapas.sarangi@gmail.com>wrote:
>>>
>>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>>> datanodes.
>>>>
>>>> -Tapas
>>>>
>>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm15119b@gmail.com> wrote:
>>>>
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>>> is probably multiplying the amount of "available capacity" incorrectly in
>>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>>> drive.
>>>>
>>>>
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <tapas.sarangi@gmail.com
>>>> > wrote:
>>>>
>>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>>> balancing when exiting otherwise it shouldn't exit.
>>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>>> 500 TB of free space.
>>>>>
>>>>> -------
>>>>>
>>>>>
>>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்)
<
>>>>> balaji@balajin.net> wrote:
>>>>>
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>>
>>>>> So the value is bytes per second. If it is running and exiting,it
>>>>> means it has completed the balancing.
>>>>>
>>>>>
>>>>> On 24 March 2013 11:32, Tapas Sarangi <tapas.sarangi@gmail.com>
wrote:
>>>>>
>>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>>> almost a day or more before exiting and starting over.
>>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable
? If it
>>>>>> is in Bits then we have a problem.
>>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>>
>>>>>> -----
>>>>>>
>>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி
நாராயணன்) <
>>>>>> lists@balajin.net> wrote:
>>>>>>
>>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>>> try increasing the balancer bandwidth
>>>>>>
>>>>>>
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <tapas.sarangi@gmail.com>wrote:
>>>>>>
>>>>>>> Thanks for the follow up. I don't know whether attachment will
pass
>>>>>>> through this mailing list, but I am attaching a pdf that contains
the usage
>>>>>>> of all live nodes.
>>>>>>>
>>>>>>> All nodes starting with letter "g" are the ones with smaller
storage
>>>>>>> space where as nodes starting with letter "s" have larger storage
space. As
>>>>>>> you will see, most of the "gXX" nodes are completely full whereas
"sXX"
>>>>>>> nodes have a lot of unused space.
>>>>>>>
>>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into
a mode
>>>>>>> where it is not able to write any further even though the total
space
>>>>>>> available in the cluster is about 500 TB. We believe this has
something to
>>>>>>> do with the way it is balancing the nodes, but don't understand
the problem
>>>>>>> yet. May be the attached PDF will help some of you (experts)
to see what is
>>>>>>> going wrong here...
>>>>>>>
>>>>>>> Thanks
>>>>>>> ------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Balancer know about topology,but when calculate balancing it
>>>>>>> operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode
about
>>>>>>> string 509.
>>>>>>>
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>>
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>>
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>>> capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>>> .Balancer think that all good if  avgutil
>>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>>
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB
node
>>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>>
>>>>>>> Balancer cant help you.
>>>>>>>
>>>>>>> Show me
>>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE
if
>>>>>>> you can.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  In ideal case with replication factor 2 ,with two nodes
12Tb and
>>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is true for exactly two nodes in the cluster with
12 TB
>>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in
rack
>>>>>>>> must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>>
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>>
>>>>>>>> It helps with balancing,because dublicated  block must be
another
>>>>>>>> rack.
>>>>>>>>
>>>>>>>>
>>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>>> racks with default threshold for the balancer minimizes the
difference
>>>>>>>> between racks ?
>>>>>>>>
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is
better
>>>>>>>> choise.
>>>>>>>>
>>>>>>>>
>>>>>>>> It wasn't my decision, and I probably can't change it now.
I am new
>>>>>>>> to this cluster and trying to understand few issues. I will
explore other
>>>>>>>> options as you mentioned.
>>>>>>>>
>>>>>>>> --
>>>>>>>> http://balajin.net/blog
>>>>>>>> http://flic.kr/balajijegan
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Mime
View raw message