hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject load balancing on cluster whose machines have low memory
Date Thu, 28 Apr 2011 16:29:49 GMT
Modifying the subject to reflect on-going discussion.
I assume you're running my patched version with HBASE-3609 which is not in
0.90 branch.
If you start with single region in your table, there is some assumption for
load balancer to work.
i.e. regions were evenly spread on the servers.

I suggest you study the distribution of the row keys in your table(s) and
pre-split when you create the table(s).
>From my observation, HBASE-3609 is able to distribute those regions across
region servers.

You can also try the patch from
https://issues.apache.org/jira/browse/HBASE-3779 which puts one the daughter
regions on underloaded server.
The solution there is not efficient but it works.

Thanks for sharing your experience.

On Thu, Apr 28, 2011 at 6:54 AM, Stan Barton <bartx007@gmail.com> wrote:

> Yes, these high limits are for the user running the hadoop/hbase processes.
> The systems are ran on a cluster of 7 machines (1 master, 6 slaves). One
> processor, two cores and 3.5GB of memory. I am using about 800MB for hadoop
> (version CDH3B2) and 2.1GB for HBase (version 0.90.2). There is 6TB on four
> disks per machine. Three zookeepers. The database contains more than 3500
> regions and the table that was fed was already about 300 regions. The table
> was fed incrementally using HTable.put().  The data are documents with size
> ranging from few bytes to megabytes where the upper limit is set to 10MB
> per
> inserted doc.
> The configuration files:
> hadoop/core-site.xml http://pastebin.ca/2051527
> hadoop/hadoop-env.sh http://pastebin.ca/2051528
> hadoop/hdfs-site.xml http://pastebin.ca/2051529
> hbase/hbase-site.xml http://pastebin.ca/2051532
> hbase/hbase-env.sh http://pastebin.ca/2051535
> Because the nproc was high I had inspected the out files of the RSs' and
> found one which indicated that all the IPCs OOMEd, unfortunately I dont
> have
> those because they got overwritten after a cluster restart. So that means
> that it was OK on the client side. Funny is that all RS processes were up
> and running, only that one with OOMEd IPCs did not really communicate
> (after
> trying to restart the importing process no inserts went through). So the
> cluster seemed OK - I was storing statistics that were apparently served by
> another RS and those were also listed OK. As I mentioned, the log of the
> bad
> RS did not mention that anything wrong happened.
> My observation was: the regions were spread on all RSs but the crashed RS
> served most of them about a half more than any other, therefore was
> accessed
> the more than others. I have discussed the load balancing in HBase 0.90.2
> with Ted Yu already.
> The balancer needs to be tuned I guess because when the table is created
> and
> loaded from scratch, the regions of the table are not balanced equally (in
> terms of numbers) in the cluster and I guess the RS that hosted the very
> first region is serving the majority of servers as they are being split. It
> imposes larger load on that RS which is more prone to failures (like mine
> OOME) and kill performance.
> I have resumed the process with rebalancing the regions beforehand and was
> achieving higher data ingestion rate and also did not ran into the OOME
> with
> one RS. Right now I am trying to replay the incident.
> I know that my scenario would require better machines, but those are what I
> have now and am before production running stress tests. In comparison with
> 0.20.6 the 0.90.2 is less stable regarding the insertion but it scales
> sub-linearily (v0.20.6 did not scale on my data) in terms of random access
> queries (including multi-versioned data) - have done extensive comparison
> regarding this.
> Stan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message