hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tapas Sarangi <tapas.sara...@gmail.com>
Subject Re: disk used percentage is not symmetric on datanodes (balancer)
Date Mon, 25 Mar 2013 01:25:11 GMT
Thanks. Does this need a restart of hadoop in the nodes where this modification is made ?

-----

On Mar 24, 2013, at 8:06 PM, Jamal B <jm15119b@gmail.com> wrote:

> dfs.datanode.du.reserved
> 
> You could tweak that param on the smaller nodes to "force" the flow of blocks to other
nodes.   A short term hack at best, but should help the situation a bit.
> 
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <tapas.sarangi@gmail.com> wrote:
> 
> On Mar 24, 2013, at 4:34 PM, Jamal B <jm15119b@gmail.com> wrote:
> 
>> It shouldn't cause further problems since most of your small nodes are already their
capacity.  You could set or increase the dfs reserved property on your smaller nodes to force
the flow of blocks onto the larger nodes.
>> 
>> 
> 
> Thanks.  Can you please specify which are the dfs properties that we can set or modify
to force the flow of blocks directed towards the larger nodes than the smaller nodes ?
> 
> -----
> 
> 
> 
>> 
> 
> 
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <tapas.sarangi@gmail.com> wrote:
>> Hi,
>> 
>> Thanks for the idea, I will give this a try and report back. 
>> 
>> My worry is if we decommission a small node (one at a time), will it move the data
to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks,
the point is it is not distributing the way we expect it to, so do you think this may cause
further problems ?
>> 
>> ---------
>> 
>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm15119b@gmail.com> wrote:
>> 
>>> Then I think the only way around this would be to decommission  1 at a time,
the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
>>> And once complete, bring back in the smaller nodes, but maybe only after you
tweak the rack topology to match your disk layout more than network layout to compensate for
the unbalanced nodes.  
>>> 
>>> Just my 2 cents
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <tapas.sarangi@gmail.com>
wrote:
>>> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
>>> 
>>> -Tapas
>>> 
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm15119b@gmail.com> wrote:
>>> 
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify
multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's
set to multiple folders on the same drives, it is probably multiplying the amount of "available
capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity
of the drive.
>>>> 
>>>> 
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <tapas.sarangi@gmail.com>
wrote:
>>>> Yes, thanks for pointing, but I already know that it is completing the balancing
when exiting otherwise it shouldn't exit. 
>>>> Your answer doesn't solve the problem I mentioned earlier in my message.
'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even
though "df" shows the cluster has about 500 TB of free space. 
>>>> 
>>>> -------
>>>>  
>>>> 
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்)
<balaji@balajin.net> wrote:
>>>> 
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>> 
>>>>> So the value is bytes per second. If it is running and exiting,it means
it has completed the balancing. 
>>>>> 
>>>>> 
>>>>> On 24 March 2013 11:32, Tapas Sarangi <tapas.sarangi@gmail.com>
wrote:
>>>>> Yes, we are running balancer, though a balancer process runs for almost
a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then
we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>> 
>>>>> -----
>>>>> 
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்)
<lists@balajin.net> wrote:
>>>>> 
>>>>>> Are you running balancer? If balancer is running and if it is slow,
try increasing the balancer bandwidth
>>>>>> 
>>>>>> 
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <tapas.sarangi@gmail.com>
wrote:
>>>>>> Thanks for the follow up. I don't know whether attachment will pass
through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>>>> 
>>>>>> All nodes starting with letter "g" are the ones with smaller storage
space where as nodes starting with letter "s" have larger storage space. As you will see,
most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space.

>>>>>> 
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
where it is not able to write any further even though the total space available in the cluster
is about 500 TB. We believe this has something to do with the way it is balancing the nodes,
but don't understand the problem yet. May be the attached PDF will help some of you (experts)
to see what is going wrong here...
>>>>>> 
>>>>>> Thanks
>>>>>> ------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Balancer know about topology,but when calculate balancing it
operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode
about string 509.
>>>>>>> 
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>> 
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>> 
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
.Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>>>> 
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB
node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>> 
>>>>>>> Balancer cant help you.
>>>>>>> 
>>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE
if you can.
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb
and 72Tb you will be able to have only 12Tb replication data.
>>>>>>> 
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12
TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in
rack must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>> 
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>> 
>>>>>>>> It helps with balancing,because dublicated  block must be
another rack.
>>>>>>>> 
>>>>>>> 
>>>>>>> The same question I asked earlier in this message, does multiple
racks with default threshold for the balancer minimizes the difference between racks ?
>>>>>>> 
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is
better choise.  
>>>>>>> 
>>>>>>> It wasn't my decision, and I probably can't change it now. I
am new to this cluster and trying to understand few issues. I will explore other options as
you mentioned.
>>>>>>> 
>>>>>>> -- 
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>> 
>>> 
>> 
> 


Mime
View raw message