hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: How to design a data warehouse in HBase?
Date Thu, 13 Dec 2012 09:42:25 GMT
Hi there,

   If you are really planning for a warehousing solution then I would
suggest you to have a look over Apache Hive. It provides you warehousing
capabilities on top of a Hadoop cluster. Along with that it also provides
an SQLish interface to the data stored in your warehouse, which would be
very helpful to you, in case you are coming from an SQL background.

HTH



Regards,
    Mohammad Tariq



On Thu, Dec 13, 2012 at 2:43 PM, bigdata <bigdatabase@outlook.com> wrote:

> Thanks. I think a real example is better for me to understand your
> suggestions.
> Now I have a relational table:ID   LoginTime                    DeviceID1
>     2012-12-12 12:12:12   abcdef2     2012-12-12  19:12:12   abcdef3
>  2012-12-13   10:10:10  defdaf
> There are several requirements about this table:1. How many device login
> in each day?1. For one day, how many new device login? (never login
> before)1. For one day, how many accumulated device login?
> How can I design HBase tables to calculate these data?Now my solution
> is:table A:
> rowkey:  date-deviceidcolumn family: logincolumn qualifier:  2012-12-12
> 12:12:12/2012-12-12 19:12:12....
> table B:rowkey: deviceidcolumn family:null or anyvalue
>
> For req#1, I can scan table A and use prefixfilter(rowkey) to check one
> special date, and get records countFor req#2, I get table b with each
> deviceid, and count result
> For req#3, count table A with prefixfilter like 1.
> Does it OK?  Or other better solutions?
> Thanks!!
>
> > CC: user@hbase.apache.org
> > From: michael_segel@hotmail.com
> > Subject: Re: How to design a data warehouse in HBase?
> > Date: Thu, 13 Dec 2012 08:43:31 +0000
> > To: user@hbase.apache.org
> >
> > You need to spend a bit of time on Schema design.
> > You need to flatten your Schema...
> > Implement some secondary indexing to improve join performance...
> >
> > Depends on what you want to do... There are other options too...
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On Dec 13, 2012, at 7:09 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> >
> > > For OLAP type queries you will generally be better off with a truly
> column oriented database.
> > > You can probably shoehorn HBase into this, but it wasn't really
> designed with raw scan performance along single columns in mind.
> > >
> > >
> > >
> > > ________________________________
> > > From: bigdata <bigdatabase@outlook.com>
> > > To: "user@hbase.apache.org" <user@hbase.apache.org>
> > > Sent: Wednesday, December 12, 2012 9:57 PM
> > > Subject: How to design a data warehouse in HBase?
> > >
> > > Dear all,
> > > We have a traditional star-model data warehouse in RDBMS, now we want
> to transfer it to HBase. After study HBase, I learn that HBase is normally
> can be query by rowkey.
> > > 1.full rowkey (fastest)2.rowkey filter (fast)3.column family/qualifier
> filter (slow)
> > > How can I design the HBase tables to implement the warehouse
> functions, like:1.Query by DimensionA2.Query by DimensionA and
> DimensionB3.Sum, count, distinct ...
> > > From my opinion, I should create several HBase tables with all
> combinations of different dimensions as the rowkey. This solution will lead
> to huge data duplication. Is there any good suggestions to solve it?
> > > Thanks a lot!
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message