hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: when to use hive vs hbase
Date Wed, 30 Apr 2014 11:45:23 GMT
Hi Shushant,

Hive and HBase are 2 different things. You can not really use one vs
another one.

Hive is a query engine against HDFS data. Data can be stored with different
format like flat text, sequence files, Paquet file, or even HBase table.
HBase is both a query engine (Get and scans) and a storage engine on top of
HDFS which allow you to store data for random read and random write.

Then you can also add tools like Phoenix and Impala in the picture which
will allow you to query the data from HDFS or HBase too.

A good way to know if HBase is a good fit or not is to ask yourself how you
are going to write into HBase or to read from HBase. HBase is good for
Random Reads and Random Writes. If you only do bulk loads and aggregations
(Full table scan), HBase is not a good fit. If you do random access (Client
information, events details, etc.) HBase is a good fit.

It's a bit over simplified, but that should give you some starting points.

2014-04-30 4:34 GMT-04:00 Shushant Arora <shushantarora09@gmail.com>:

> I have a requirement of processing huge weblogs on daily basis.
> 1. data will come incremental to datastore on daily basis and I  need
> cumulative and daily
> distinct user count from logs and after that aggregated data will be loaded
> in RDBMS like mydql.
> 2.data will be loaded in hdfs datawarehouse on daily basis and same will be
> fetched from Hdfs warehouse after some filtering in RDMS like mysql and
> will be processed there.
> Which datawarehouse is suitable for approach 1 and 2 and why?.
> Thanks
> Shushant

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message