hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <ferdy.gal...@kalooga.com>
Subject Re: distcp performing much better for rebalancing than dedicated balancer
Date Thu, 05 May 2011 13:43:16 GMT
The decommissioning was performed with solely refreshNodes, but that's 
somewhat irrelevant because the balancing tests were performed after I 
re-added the 11 empty nodes. (FYI the drives were formatted with another 
unix fs). Though I did notice that the decommissioning shows about the 
same metrics as that of the balancer test afterwards,  not very fast 
that is.

On 05/05/2011 02:57 PM, Mathias Herberts wrote:
> Did you explicitely start a balancer or did you decommission the nodes
> using dfs.hosts.exclude and a dfsadmin -refreshNodes?
> On Thu, May 5, 2011 at 14:30, Ferdy Galema<ferdy.galema@kalooga.com>  wrote:
>> Hi,
>> On our 15node cluster (1GB ethernet and 4x1TB disk per node) I noticed that
>> distcp does a much better job at rebalancing than the dedicated balancer
>> does. We needed to decommision 11 nodes, so that prior to rebalancing we had
>> 4 used and 11 empty nodes. The 4 used nodes had about 25% usage each. Most
>> of our files are of average size: We have about 500K files in 280K blocks
>> and 800K blocks total (blocksize is 64MB).
>> So I changed dfs.balance.bandwidthPerSec to 800100100 and restarted the
>> cluster. Started the balancer tool and I noticed that the it moved about
>> 200GB in 1 hour. (I grepped the balancer log for "Need to move").
>> After stopping the balancer I started a distcp.  This tool copied 900GB in
>> just 45 minutes, with an average replication of 2 so it's total throughput
>> was around 2.4 TB/hour. Fair enough, it is not purely rebalancing because
>> the 4 overused nodes also get new blocks, still it performs much better.
>> Munin confirms the much higher disk/ethernet throughputs of the distcp.
>> Are these characteristics to be expected? Either way, can the balancer be
>> boosted even more? (Aside the dfs.balance.bandwidthPerSec property).
>> Ferdy.

View raw message