hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@gmail.com>
Subject Re: Regions and Rowkeys
Date Tue, 12 May 2015 00:57:01 GMT
On Mon, May 11, 2015 at 3:38 PM, Arun Patel <arunp.bigdata@gmail.com> wrote:

> 1) I have a 10 node HBase cluster.  When I create a table in HBase,
> how many regions will be allocated by default?

In HBase, the number of region servers is orthogonal to table partitions.
These two operational details are related but managed independently.

I looked at the HBase Master UIand it seems regions are not allocated to
> all the Regionservers by
> default.  How can I allocate the regions in all Region Servers?

HBase will evenly balance the regions of all tables it's hosting across all
region servers in the cluster. If you have fewer regions than region
servers, some servers will have no regions to host.

Basically, This distributes the data in a better way If I am using a slated
> key. My requirement is to distribute the data across the cluster using
> salted keys.  But, Having few regions is a constraint?

You're moving in the right direction. The next step would be to split your
table according to some prefix value, presumably related to your "salting"
choice. This will depend on what value you're prepending to the row keys
and the cardinality of those values. Apache Phoenix does this, for example,
with a fixed byte prefix and an one pre-split per salt-byte value (i.e., 0,
1, 2, 3, ... 15).

2) How does the rowkey to region mapping works?  In Cassandra, we have a
> concept of assigning token range for each node.  Rowkey will be assigned to
> a node based on the token range.  How does this work in HBase?

HBase is ordered and range-partitioned. Basically, your row keys are sorted
and region boundaries are determined at points within that range. So if you
have rows 'a' - 'z', HBase will define regions as contiguous segments of
this range, 'a' - 'f', and 'g' - 'k' for example. The range of a region is
dictated primarily by the amount of data contained therein. When a region
becomes too big, it will be split in half and two child regions are created
(i.e., 'a' - 'f' becomes 'a' - 'c' and 'd' - 'f'). Once a region splits,
the children are independent and can be moved to other region servers.

I explain a bit of this and more in my talk "HBase for Architects". I link
to a video from my blog [0]. As Michael mentioned, there's more detail
published in both our book [1], as well as our other books [2], [3].

Welcome to HBase ;)

[0]: http://www.n10k.com/blog/hbase-for-architects-redux/
[1]: https://hbase.apache.org/book.html#regions.arch
[2]: http://www.manning.com/dimidukkhurana/
[3]: http://shop.oreilly.com/product/0636920033943.do

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message