hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: hbase schema design
Date Wed, 11 Dec 2013 04:34:03 GMT
100 writes/updates per min is very low number and HBase, of course, is able to sustain 1.5
update/sec (if not GBs per update)
1000 concurrent users and minimum query latency - probably possible but we do not have enough
 What is SLA? requests per sec and latency requirements? How large is the typical result set?

You will definitely need to keep your hot data set in a RAM. If you can afford to store data
twice and ACID transaction
is not your MUST HAVE feature:

Have two rows per your asset item:
rowkey1: asset_key + update_time
rowkey2: update_time + asset_key

This basically, gives you 2 covered indexes: by asset_key and by update_time, but because
you duplicate data
you replaces many random look ups (as in case of a simple index) by one scan operation on
a corresponding

On asset update insert two rows into table (you can keep them in the same table) and make
sure you have enough RAM
(cache) to keep all in memory.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

From: Steven Wu [wulinux@gmail.com]
Sent: Tuesday, December 10, 2013 3:35 PM
To: user@hbase.apache.org
Subject: hbase schema design


   I am very new to Hbase, still self-learning and do POC for our current
project.  I have a question about the row key design.

I have created  big table (called asset table), it  has more than 50M
records. Each asset has unique key (let's call it asset_key)

This table receives continuous updates from up-stream system (around 100
updates per min). The clients would like to receive real-time updates from
us. At current system, we have two indexed columns (asset_key, update_ts) on
asset DB table So the clients could query the db table based on update_ts
for lastest updates. However the db now become a bottleneck

So we are wondering how could we achieve the same function in Hbase. I don't
want to use scan filter function on the column as it will tiger full table
scan (correct me if I am wrong on this).

the best thing I could think of is to have timestamp built in to rowkey.
However, we still have a requirement, that client would like query data
based on unique asset_key

The usercase we have is the system has to support concurrently more than
1000 uses to query latest update from this table at lowest possible latency.
Also ,  clients would like query data based on unique asset_key  to retrieve
records from our system

Really appreciate your though on this.



Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

View raw message