hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suraj Varma <svarma...@gmail.com>
Subject Re: Finding the correct region server
Date Mon, 02 Jul 2012 23:06:02 GMT
If I understand you right, you are asking about how region splitting works ...
See http://hbase.apache.org/book/regions.arch.html section 9.7.4

In a nutshell, the parent region on your RS1 will split into two
daughter regions on the same RS1. If you have load balancer turned on,
the master can then "reassign" the daughter regions to other
RegionServers based on the number of regions being served by each RS.
This is unrelated to how many requests RSn may be receiving. The
"region load" above is just number of regions per RS currently.

The scheme you describe below would only work in a very "static" data
/ region assignment scenario where a region will always stick to the
same RS until you manually move it around (load balancer turned off,
region size tuned up).
This is a highly recommended read:

If you are worried about latency, I hope you have also read up on
"Block Cache" and MemStore and sizing them appropriately for your

On Fri, Jun 29, 2012 at 10:15 AM, Ramchander Varadarajan
<ramsci@yahoo-inc.com> wrote:
> Hi all,
> We are evaluating Hbase to store some metadata information on a very large scale. As
of now, our architecture looks like this.
> Machine 1:
>     Runs Client 1
>     Runs Region Server 1
>     Runs Data Node 1
> Machine n:
>     Runs Client n
>     Runs Region Server n
>     Runs Data Node n
> Now, say, we have only one Region for the data set at the moment and its maxing out,
and the region is in Region Server 1. If a flood of new requests come in to Machine n, and
it tries to store the data, will Region Server n store it locally on its data node n, or will
the requests be routed to Region Server 1 and a new region is created there after it splits?
> The reason I ask is because I want to see if a Client can be made sticky to a region
server. That way, if a user with an id 1111 comes in, he will be sent to Client 1 all the
time, because we know Region Server 1 will have his region. We will know that by using his
id to figure that out upfront. Just trying to minimize the latency further. ( Of course I
understand that if nodes are down, there will be ways to route the traffic to another host
to handle the users that fall in that bucket)
> thanks in advance

View raw message