Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <48F7A70C.9010607@duboce.net>
Date: Thu, 16 Oct 2008 13:41:48 -0700
From: stack <stack@duboce.net>
User-Agent: Thunderbird 2.0.0.17 (Macintosh/20080914)
MIME-Version: 1.0
To: hbase-user@hadoop.apache.org
Subject: Re: HBase and hadoop cluster rebalance
References: <cbaae0a20810151448n621251ecma2864d410cce7d4a@mail.gmail.com>
In-Reply-To: <cbaae0a20810151448n621251ecma2864d410cce7d4a@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Daniel Ploeg wrote:
> Hi all,
>
> I performed a cluster rebalance on my test cluster yesterday (5 regionserver
> / datanodes each with approx 400GB - total approx 2TB HDFS) and I would like
> to know if the mailing lists have seen similar results to what I've seen.
>   

I talked to the lads running hbase here at powerset.  They believe they 
have seen something similar when they grow the cluster by some 
significant percentage (20-30%).  The addition of new machines brings on 
a rebalancing and thereafter hbase runs "faster".

> I had a single table with a single column family and loaded it up so that it
> just about filled the entire cluster. Actually one or two of the nodes had
> run out of space, yet the fifth machine only had 50% of its disks utilised
> (which is why I though a rebalance was in order). There are a total of 1475
> regions in the cluster. Prior to starting the rebalance the cluster only had
> about 250GB left to it's disposal. After the rebalance I now have almost
> 800GB free.
>   

If 1475 regions, update to 0.18.1 (coming soon).

> Furthermore, I was performing read tests prior to the rebalance and getting
> a response time of approx 500ms per row (each row has 10000 column instances
> of the column family which were deserialised as part of the test). After the
> rebalance my read times reduced to around 340ms.
>
>   
If you could have fewer columns in a family column, you'll get a bit 
better performance: HBASE-867.

Good on you Daniel,
St.Ack