hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bigdata <bigdatab...@outlook.com>
Subject RE: How to design a data warehouse in HBase?
Date Thu, 13 Dec 2012 09:13:03 GMT
Thanks. I think a real example is better for me to understand your suggestions.
Now I have a relational table:ID   LoginTime                    DeviceID1     2012-12-12 12:12:12
  abcdef2     2012-12-12  19:12:12   abcdef3      2012-12-13   10:10:10  defdaf
There are several requirements about this table:1. How many device login in each day?1. For
one day, how many new device login? (never login before)1. For one day, how many accumulated
device login?
How can I design HBase tables to calculate these data?Now my solution is:table A:     
rowkey:  date-deviceidcolumn family: logincolumn qualifier:  2012-12-12 12:12:12/2012-12-12
table B:rowkey: deviceidcolumn family:null or anyvalue

For req#1, I can scan table A and use prefixfilter(rowkey) to check one special date, and
get records countFor req#2, I get table b with each deviceid, and count result
For req#3, count table A with prefixfilter like 1.
Does it OK?  Or other better solutions?

> CC: user@hbase.apache.org
> From: michael_segel@hotmail.com
> Subject: Re: How to design a data warehouse in HBase?
> Date: Thu, 13 Dec 2012 08:43:31 +0000
> To: user@hbase.apache.org
> You need to spend a bit of time on Schema design.
> You need to flatten your Schema...
> Implement some secondary indexing to improve join performance...
> Depends on what you want to do... There are other options too...
> Sent from a remote device. Please excuse any typos...
> Mike Segel
> On Dec 13, 2012, at 7:09 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> > For OLAP type queries you will generally be better off with a truly column oriented
> > You can probably shoehorn HBase into this, but it wasn't really designed with raw
scan performance along single columns in mind.
> > 
> > 
> > 
> > ________________________________
> > From: bigdata <bigdatabase@outlook.com>
> > To: "user@hbase.apache.org" <user@hbase.apache.org> 
> > Sent: Wednesday, December 12, 2012 9:57 PM
> > Subject: How to design a data warehouse in HBase?
> > 
> > Dear all,
> > We have a traditional star-model data warehouse in RDBMS, now we want to transfer
it to HBase. After study HBase, I learn that HBase is normally can be query by rowkey.
> > 1.full rowkey (fastest)2.rowkey filter (fast)3.column family/qualifier filter (slow)
> > How can I design the HBase tables to implement the warehouse functions, like:1.Query
by DimensionA2.Query by DimensionA and DimensionB3.Sum, count, distinct ...
> > From my opinion, I should create several HBase tables with all combinations of different
dimensions as the rowkey. This solution will lead to huge data duplication. Is there any good
suggestions to solve it?
> > Thanks a lot!
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message