hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvio Di gregorio <silvio.digrego...@gmail.com>
Subject Re: hbase schema design
Date Wed, 11 Dec 2013 05:38:14 GMT
Hi
These are a characteristic time Series data. You must prefix rowkey TO
avoid workload TO only one regione server.
<something not monotonic variable>_timestamp.
Il 11/dic/2013 00:35 "Steven Wu" <wulinux@gmail.com> ha scritto:

>
>
>
>
> Hi
>
>    I am very new to Hbase, still self-learning and do POC for our current
> project.  I have a question about the row key design.
>
> I have created  big table (called asset table), it  has more than 50M
> records. Each asset has unique key (let's call it asset_key)
>
> This table receives continuous updates from up-stream system (around 100
> updates per min). The clients would like to receive real-time updates from
> us. At current system, we have two indexed columns (asset_key, update_ts)
> on
> asset DB table So the clients could query the db table based on update_ts
> for lastest updates. However the db now become a bottleneck
>
> So we are wondering how could we achieve the same function in Hbase. I
> don't
> want to use scan filter function on the column as it will tiger full table
> scan (correct me if I am wrong on this).
>
>
>
> the best thing I could think of is to have timestamp built in to rowkey.
> However, we still have a requirement, that client would like query data
> based on unique asset_key
>
>
>
> The usercase we have is the system has to support concurrently more than
> 1000 uses to query latest update from this table at lowest possible
> latency.
> Also ,  clients would like query data based on unique asset_key  to
> retrieve
> records from our system
>
>
>
>
>
> Really appreciate your though on this.
>
>
>
>
>
>
>
> Regards,
>
>
>
>
>
> Steven
>
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message