hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Bigdatafun <sean.bigdata...@gmail.com>
Subject HBase and Star Schema
Date Mon, 14 Jun 2010 06:52:26 GMT
I am reading a blog related to HBase and its application in OLAP.
In that blog, Jean-Daniel mentioned that "If you can afford to denormalize
your data by putting the dimension table data into the same table as the
fact table, then you can get very good read efficiency. For each dimension,
you would have a column family." Can someone give me more details about this

I understand Zohmg did some work in this area, but when I read the thesis
related to this project (
http://github.com/zohmg/zohmg/raw/master/doc/report/msc-report.pdf), it does
not seem to use the above approach that Jean-Daniel suggested (page 32 --
Storage/Data Model describes how Zohmg stores data). Actually, I am not sure
if Zohmg's approach can even scale for a super large dataset with lots of
dimensions -- the storage space will blow.

Can someone give me some detailed explanation of both of the above
approaches to achieve star schema implementation? Let's say we are trying to
model the following problem:

"(date, store_name, product_name, buyer_age) ---> (number of sale, total
number sold)"
In other words, we want to build an OLAP cube from the above 4 dimensions:
date, the name of store, the product name, the buyer's age (they point out
to the dimension tables in the Star Schema world)


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message