hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Write TimeSeries Data and Do Time Based Range Scans
Date Mon, 23 Sep 2013 21:41:47 GMT
Hi All,

I have a secondary index(inverted index) table with a rowkey on the basis
of Timestamp of an event. Assume the rowkey as <TimeStamp in Epoch>.
I also store some extra(apart from main_table rowkey) columns in that table
for doing filtering.

The requirement is to do range-based scan on the basis of time of
event.  Hence, the index with this rowkey.
I cannot use Hashing or MD5 digest solution because then i cannot do range
based scans.  And, i already have a index like OpenTSDB in another table
for the same dataset.(I have many secondary Index for same data set.)

Problem: When we increase the write workload during stress test. Time
secondary index becomes a bottleneck due to the famous Region HotSpotting
problem.
Solution: I am thinking of adding a prefix of { (<TimeStamp in Epoch>%10) =
bucket}  in the rowkey. Then my row key will become:
 <Bucket><TimeStamp in Epoch>
By using above rowkey i can at least alleviate *WRITE* problem.(i don't
think problem can be fixed permanently because of the use case requirement.
I would love to be proven wrong.)
However, with the above row key, now when i want to *READ* data, for every
single range scans i have to read data from 10 different regions. This
extra load for read is scaring me a bit.

I am wondering if anyone has better suggestion/approach to solve this
problem given the constraints i have.  Looking for feedback from community.

-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message