hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shiv Sharma <>
Subject HIVE vs HBASE for Datawarehousing
Date Wed, 22 Feb 2012 17:36:23 GMT
4 Newbie questions:

1. Assuming we are ok with non-SQL access, would HBASE  work as a store for
a datawarehouse?

      Basically, why HIVE for a warehouse? Why not HBASE? I understand the
SQL interface to HIVE, but are there other reasons?

2. How is the HBASE data model different from Hive?

BigTable has this wiki description
sparse, distributed multi-dimensional sorted map

I could not find the corresponding description for HBASE, but I assume this
is true for HBASE as well.

So 2.1  Is the BigTable description true for HBASE as well ?
     2.2  What is the corresponding description for HIVE?

3) ETL in HIVE

 One typical pattern in traditional ETL is :
    -- for dimension element in fact stream, lookup dimension to see if
dimension value exists
         if exists, get the dimension key
         if not , insert new  dimension value and use this (new) value for
the current record

  3.1 Can this be achieved in HIVE?
  3.2 Can it be done in HIVE-SQL?

4)  (More ETL)
I often find myself updating tables to add more context from "later
arriving data". This takes the form of updating columns in dimension tables,
or updating an aggregate table and such.

4.1 Can this be achieved in HIVE?
4.2 Can it be done in HIVE-SQL?

Thank you,

View raw message