hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish.Bhoge <Manish.Bh...@target.com>
Subject RE: HIVE vs HBASE for Datawarehousing
Date Thu, 23 Feb 2012 06:08:38 GMT
Shiv,

Both Hbase and Hive alone is not perfect fit for Datawarehouse application. We have to use
Hive and Hbase to feed into traditional datawarehouse application.

Hbase: Hbase is more update oriented. Insert / Update / Delete operation is efficient in Hbase.
But it doesn't perform well as Hive on frequent read.
Hive: Hive however very good interface with HiveQL.  Having SQL like interface make the data
retrieval easy as well as efficient too. Hbase is NoSQL database.

Once you receive the data from outer world, you can use Hbase as a datastore to feed into
your datawarehouse and Hive can play as a interface to retrieve the data for analysis.

There is one more aspect you can consider here is the use of PIG script which is proved as
a very good analysis tool. Here you don't need to maintain the schema and still you can write
a code like a SQL script.


PS: search the apache repository for HBase and Hive interface to see how both can talk together.

Thank You,
Manish

From: Shiv Sharma [mailto:aatman.eq.brahman@gmail.com]
Sent: Wednesday, February 22, 2012 11:06 PM
To: user@hive.apache.org
Subject: HIVE vs HBASE for Datawarehousing

4 Newbie questions:

1. Assuming we are ok with non-SQL access, would HBASE  work as a store for a datawarehouse?

      Basically, why HIVE for a warehouse? Why not HBASE? I understand the SQL interface to
HIVE, but are there other reasons?

2. How is the HBASE data model different from Hive?

BigTable has this wiki description
sparse, distributed multi-dimensional sorted map

I could not find the corresponding description for HBASE, but I assume this is true for HBASE
as well.

So 2.1  Is the BigTable description true for HBASE as well ?
     2.2  What is the corresponding description for HIVE?

3) ETL in HIVE

 One typical pattern in traditional ETL is :
    -- for dimension element in fact stream, lookup dimension to see if dimension value exists
         if exists, get the dimension key
         if not , insert new  dimension value and use this (new) value for the current record

  3.1 Can this be achieved in HIVE?
  3.2 Can it be done in HIVE-SQL?


4)  (More ETL)
I often find myself updating tables to add more context from "later arriving data". This takes
the form of updating columns in dimension tables,
or updating an aggregate table and such.

4.1 Can this be achieved in HIVE?
4.2 Can it be done in HIVE-SQL?

Thank you,
Shiv

Mime
View raw message