hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Kelly <simongdke...@gmail.com>
Subject Re: fast scan VS hot regions
Date Fri, 25 May 2012 15:58:40 GMT
Hi Andre

Have a look at HbaseWD from Sematext: https://github.com/sematext/HBaseWD

The strategy there is to prefix monotonic row keys by a bin number. This
spreads the writes across N bins but still allows efficient scans assuming
N is not large (N scans are required).

On May 25, 2012 11:13 AM, "Andre Reiter" <a.reiter@web.de> wrote:

> i'm starting a new project, which is pretty simple
> it will be something like google analytics, but of course a bit smaller
> what is required: web servers handle requests with a kind of generic
> key/value list
> that requests will come at a pretty much high rate, lets say 1000 req per
> second
> so far i guess, there will be no problem, to handle that, and to store it
> in the hbase, right?
> on the other hand, of course, the data must be processed and monitored
> that is required to be time based, i.e. i want to get statistics about a
> time period, lets say from day A to day B
> that should wotk, BUT!
> if i want to have a fast scan, i need to have the time stamp in the row
> key, right? other wise i well need to make a full scan, which can take a
> lot of time, if there is much data
> but if i have the timestamp in the key, i will end up having hot regions,
> like described here http://ikaisays.com/2011/01/**
> 25/app-engine-datastore-tip-**monotonically-increasing-**values-are-bad/<http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/>
> so what would be a better way, to have fast scans without hot regions?
> cheers
> andre

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message