Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 26038 invoked from network); 13 May 2009 17:56:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 May 2009 17:56:37 -0000 Received: (qmail 13494 invoked by uid 500); 13 May 2009 17:56:36 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 13467 invoked by uid 500); 13 May 2009 17:56:36 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 13456 invoked by uid 99); 13 May 2009 17:56:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 May 2009 17:56:36 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 May 2009 17:56:25 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1M4IgW-0003vN-O8 for hbase-user@hadoop.apache.org; Wed, 13 May 2009 10:56:04 -0700 Message-ID: <23526652.post@talk.nabble.com> Date: Wed, 13 May 2009 10:56:04 -0700 (PDT) From: Alexandra Alecu To: hbase-user@hadoop.apache.org Subject: Hbase 0.19.2 - Large import results in heavily unbalanced hadoop DFS MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: alexandra.alecu@gmail.com X-Virus-Checked: Checked by ClamAV on apache.org I am a Hbase/hadoop beginner. As an initial test, I am trying to import about 120GB of records into one big table in HBase (replication level 2). I have a HBase master and a Hadoop namenode running on two separate machines and 4 other nodes running the datanodes and regionservers. Each datanode has approx 400 GB local storage. I have done a few tests previously with Hbase 0.19.1 and I kept on running into problems related to the slow compactions (HBASE-1058). I have now installed HBase 0.19.2 and one thing I noticed is that the disk usage during import is much higher and the datanodes come out very unbalanced. Whereas using HBase 0.19.1, I used to fill about 300 GB nicely balanced, now I have filled about 700GB, 100GB on each of the 3 datanodes and one of the nodes gets completely full (400GB) causing the import to slowdown and eventually fail not being able to contact one of the .META. regions. I stopped HBase and tried to balance the hdfs which informed me : 09/05/13 17:34:38 INFO balancer.Balancer: Need to move 177.92 GB bytes to make the cluster balanced. After this, with hadoop hard at work balancing, it seems to fail to move blocks 50% of the time, should I worry about these errors/warnings: 09/05/13 17:34:38 WARN balancer.Balancer: Error moving block -6198018159178133648 from 131.111.70.215:50010 to 131.111.70.214:50010 through 131.111.70.216:50010: block move is failed Checking the balancing process, it looks like the hdfs usage constantly decreases, having at the end a value closer to what i expected. Essentially, it looks like the balancing has wiped the data which was causing this one datanode to fill up to almost 100%. Maybe this data was caused by the delayed compaction or some logs which need to be played on the cluster. This is the situation towards the end of the balancing : Datanodes available: 4 (4 total, 0 dead) Name: 1 Configured Capacity: 433309891584 (403.55 GB) DFS Used: 88593623040 (82.51 GB) DFS Used%: 20.45% DFS Remaining%: 78.51% Last contact: Wed May 13 18:48:09 BST 2009 Name: 2 Configured Capacity: 433309891584 (403.55 GB) DFS Used: 89317653511 (83.18 GB) DFS Used%: 20.61% DFS Remaining%: 78.34% Last contact: Wed May 13 18:48:10 BST 2009 Name: 3 Configured Capacity: 433309891584 (403.55 GB) DFS Used: 89644974080 (83.49 GB) DFS Used%: 20.69% DFS Remaining%: 78.27% Last contact: Wed May 13 18:48:10 BST 2009 Name: 4 Configured Capacity: 433309891584 (403.55 GB) DFS Used: 138044233537 (128.56 GB) DFS Used%: 31.86% DFS Remaining%: 67.07% Last contact: Wed May 13 18:48:10 BST 2009 Before the balancing, the datanode no 4 was using approx 400 GB. What are your comments on this behaviour? Is this something that you expected? Let me know if you need me to provide more information. Many thanks, Alexandra Alecu. -- View this message in context: http://www.nabble.com/Hbase-0.19.2---Large-import-results-in-heavily-unbalanced-hadoop-DFS-tp23526652p23526652.html Sent from the HBase User mailing list archive at Nabble.com.