hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning" <tdunn...@veoh.com>
Subject RE: Block re-balancing speed/slowness
Date Sat, 10 May 2008 18:06:13 GMT

I have a home-made balancing script that I use that runs considerably faster than this without
noticeable degradation.

It uses the strategy of increasing replicas temporarily and then decreasing them.  Because
I wrote it as a hack, it has a bunch of processes hanging around waiting until it is time
to drop the number of replicas, but it is very effective.  In my experience it can fill 100GB
of an empty datanode in a few tens of minutes.  It would be very much slower if my files were
smaller (they average 500-1000MB).


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Fri 5/9/2008 8:55 PM
To: core-user@hadoop.apache.org
Subject: Block re-balancing speed/slowness
 
Hi,

First off, big thanks to Lohit and Hairong with help with HDFS "corruption", DN decommissioning
and block re-balancing!

I'm now re-balancing, but like Ted Dunning noted in http://markmail.org/message/fzd33k7a3isijto5
, this seems to be a veeery slow process.  Here are some concrete numbers:

$ bin/hadoop balancer
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being
Moved
May 9, 2008 7:14:28 PM            0                 0 KB           100.96 GB             
10 GB
May 9, 2008 7:37:28 PM            1            409.66 MB            99.64 GB             
10 GB
May 9, 2008 8:00:59 PM            2            840.89 MB             98.3 GB             
10 GB
May 9, 2008 8:22:29 PM            3              1.18 GB               97 GB             
10 GB
May 9, 2008 8:44:59 PM            4              1.58 GB             95.7 GB             
10 GB
May 9, 2008 9:07:30 PM            5               2.1 GB            94.42 GB             
10 GB
May 9, 2008 9:29:31 PM            6              2.42 GB            93.09 GB             
10 GB
May 9, 2008 9:52:02 PM            7              2.82 GB            91.91 GB             
10 GB
May 9, 2008 10:14:02 PM           8              3.47 GB            90.57 GB             
10 GB

10 GB in 3 h.... doesn't that seem slow?  I can rsync 1GB of data between 2 (EC2) boxes in
this cluster in about a minute.

$ bc
3*60*60 = 10800 seconds
85899345920/10800 = 7953643 bits/sec
7953643/1024 = 7767 kb/sec
7953643/1024/1024 = 7 Mb/sec

Is the block balancer purposely not making 100% use of the available bandwidth for some reason?

Or, wait, the "already moved" and "left to move" numbers don't match up, I just noticed. 
Should one be looking at "bytes being moved" column instead?  In other words 8x10GB in 3h
above?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message