hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Wu" <wuli...@gmail.com>
Subject hbase schema design
Date Tue, 10 Dec 2013 23:35:02 GMT



   I am very new to Hbase, still self-learning and do POC for our current
project.  I have a question about the row key design.

I have created  big table (called asset table), it  has more than 50M
records. Each asset has unique key (let's call it asset_key)

This table receives continuous updates from up-stream system (around 100
updates per min). The clients would like to receive real-time updates from
us. At current system, we have two indexed columns (asset_key, update_ts) on
asset DB table So the clients could query the db table based on update_ts
for lastest updates. However the db now become a bottleneck

So we are wondering how could we achieve the same function in Hbase. I don't
want to use scan filter function on the column as it will tiger full table
scan (correct me if I am wrong on this).


the best thing I could think of is to have timestamp built in to rowkey.
However, we still have a requirement, that client would like query data
based on unique asset_key


The usercase we have is the system has to support concurrently more than
1000 uses to query latest update from this table at lowest possible latency.
Also ,  clients would like query data based on unique asset_key  to retrieve
records from our system



Really appreciate your though on this.











  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message