hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: md5 hash key and splits
Date Fri, 31 Aug 2012 14:55:03 GMT
On Thu, Aug 30, 2012 at 11:52 PM, Stack <stack@duboce.net> wrote:

> On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia <mohitanchlia@gmail.com>
> wrote:
> > In general isn't it better to split the regions so that the load can be
> > spread accross the cluster to avoid HotSpots?
> >
>
> Time series data is a particular case [1] and the sematextians have
> tools to help w/ that particular loading pattern.  Is time series your
> loading pattern?  If so, yes, you need to employ some smarts (tsdb
> schema and write tricks or hbasewd tool) to avoid hotspotting.  But
> hotspotting is an issue apart from splts; you can split all you want
> and if your row keys are time series, splitting won't undo them.
>
> My data is timeseries and to get random distribution and still have the
keys in the same region for a user I am thinking of using
md5(userid)+reversetimestamp as a row key. But with this type of key how
can one do pre-splits? I have 30 nodes.


> You would split to distribute load over the cluster and HBase should
> be doing this for you w/o need of human intervention (caveat the
> reasons you might want to manually split as listed above by AK and
> Ian).
>
> St.Ack
> 1. http://hbase.apache.org/book.html#rowkey.design
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message