hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Regions and Rowkeys
Date Tue, 12 May 2015 13:58:46 GMT
Arun:
See the following for details:

http://hbase.apache.org/book.html#_determining_split_points
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html

Cheers

On Tue, May 12, 2015 at 6:11 AM, Talat Uyarer <talat@uyarer.com> wrote:

> Hi Arun,
>
> rowKeys. Hbase decide which data is stored which region by rowkeys.
> the RegionSplitter uses MD5 algorithm to generate region starting keys
> of MD5 checksum.
>
> Talat
>
>
>
> 2015-05-12 15:48 GMT+03:00 Arun Patel <arunp.bigdata@gmail.com>:
> > Thank you.  This helps.
> >
> > So, when I pre-split regions with below command, SPLITALGO is creating
> the
> > rowkey boundaries for each region?
> >
> > create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
> >
> > I am failing to understand HexStringSplit.  As per documentation,The
> format
> > of a HexStringSplit region boundary is the ASCII representation of an MD5
> > checksum, or any other uniformly distributed hexadecimal value.
> >
> > My Question is MD5 Checksum of what?
> >
> > Regards,
> > Arun
> >
> >
> >
> >
> >
> > On Mon, May 11, 2015 at 8:57 PM, Nick Dimiduk <ndimiduk@gmail.com>
> wrote:
> >
> >> On Mon, May 11, 2015 at 3:38 PM, Arun Patel <arunp.bigdata@gmail.com>
> >> wrote:
> >>
> >> > 1) I have a 10 node HBase cluster.  When I create a table in HBase,
> >> > how many regions will be allocated by default?
> >>
> >>
> >> In HBase, the number of region servers is orthogonal to table
> partitions.
> >> These two operational details are related but managed independently.
> >>
> >> I looked at the HBase Master UIand it seems regions are not allocated to
> >> > all the Regionservers by
> >> > default.  How can I allocate the regions in all Region Servers?
> >>
> >>
> >> HBase will evenly balance the regions of all tables it's hosting across
> all
> >> region servers in the cluster. If you have fewer regions than region
> >> servers, some servers will have no regions to host.
> >>
> >> Basically, This distributes the data in a better way If I am using a
> slated
> >> > key. My requirement is to distribute the data across the cluster using
> >> > salted keys.  But, Having few regions is a constraint?
> >> >
> >>
> >> You're moving in the right direction. The next step would be to split
> your
> >> table according to some prefix value, presumably related to your
> "salting"
> >> choice. This will depend on what value you're prepending to the row keys
> >> and the cardinality of those values. Apache Phoenix does this, for
> example,
> >> with a fixed byte prefix and an one pre-split per salt-byte value
> (i.e., 0,
> >> 1, 2, 3, ... 15).
> >>
> >> 2) How does the rowkey to region mapping works?  In Cassandra, we have a
> >> > concept of assigning token range for each node.  Rowkey will be
> assigned
> >> to
> >> > a node based on the token range.  How does this work in HBase?
> >>
> >>
> >> HBase is ordered and range-partitioned. Basically, your row keys are
> sorted
> >> and region boundaries are determined at points within that range. So if
> you
> >> have rows 'a' - 'z', HBase will define regions as contiguous segments of
> >> this range, 'a' - 'f', and 'g' - 'k' for example. The range of a region
> is
> >> dictated primarily by the amount of data contained therein. When a
> region
> >> becomes too big, it will be split in half and two child regions are
> created
> >> (i.e., 'a' - 'f' becomes 'a' - 'c' and 'd' - 'f'). Once a region splits,
> >> the children are independent and can be moved to other region servers.
> >>
> >> I explain a bit of this and more in my talk "HBase for Architects". I
> link
> >> to a video from my blog [0]. As Michael mentioned, there's more detail
> >> published in both our book [1], as well as our other books [2], [3].
> >>
> >> Welcome to HBase ;)
> >> -n
> >>
> >> [0]: http://www.n10k.com/blog/hbase-for-architects-redux/
> >> [1]: https://hbase.apache.org/book.html#regions.arch
> >> [2]: http://www.manning.com/dimidukkhurana/
> >> [3]: http://shop.oreilly.com/product/0636920033943.do
> >>
>
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message