hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Reiter <a.rei...@web.de>
Subject fast scan VS hot regions
Date Fri, 25 May 2012 09:13:03 GMT
i'm starting a new project, which is pretty simple
it will be something like google analytics, but of course a bit smaller
what is required: web servers handle requests with a kind of generic key/value list
that requests will come at a pretty much high rate, lets say 1000 req per second
so far i guess, there will be no problem, to handle that, and to store it in the hbase, right?

on the other hand, of course, the data must be processed and monitored
that is required to be time based, i.e. i want to get statistics about a time period, lets
say from day A to day B
that should wotk, BUT!
if i want to have a fast scan, i need to have the time stamp in the row key, right? other
wise i well need to make a full scan, which can take a lot of time, if there is much data
but if i have the timestamp in the key, i will end up having hot regions, like described here
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
so what would be a better way, to have fast scans without hot regions?

cheers
andre


Mime
View raw message