hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shushant Arora <shushantaror...@gmail.com>
Subject Re: when to use hive vs hbase
Date Wed, 30 Apr 2014 12:13:23 GMT
Hi Jean

Thanks for explanation .

I still  have one doubt
Why HBase is not good for bulk loads and aggregations
(Full table scan) ? Hive will also read each row for aggregation as well as
HBase .
Can you explain more ?

On Wed, Apr 30, 2014 at 5:15 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Shushant,
> Hive and HBase are 2 different things. You can not really use one vs
> another one.
> Hive is a query engine against HDFS data. Data can be stored with different
> format like flat text, sequence files, Paquet file, or even HBase table.
> HBase is both a query engine (Get and scans) and a storage engine on top of
> HDFS which allow you to store data for random read and random write.
> Then you can also add tools like Phoenix and Impala in the picture which
> will allow you to query the data from HDFS or HBase too.
> A good way to know if HBase is a good fit or not is to ask yourself how you
> are going to write into HBase or to read from HBase. HBase is good for
> Random Reads and Random Writes. If you only do bulk loads and aggregations
> (Full table scan), HBase is not a good fit. If you do random access (Client
> information, events details, etc.) HBase is a good fit.
> It's a bit over simplified, but that should give you some starting points.
> 2014-04-30 4:34 GMT-04:00 Shushant Arora <shushantarora09@gmail.com>:
> > I have a requirement of processing huge weblogs on daily basis.
> >
> > 1. data will come incremental to datastore on daily basis and I  need
> > cumulative and daily
> > distinct user count from logs and after that aggregated data will be
> loaded
> > in RDBMS like mydql.
> >
> > 2.data will be loaded in hdfs datawarehouse on daily basis and same will
> be
> > fetched from Hdfs warehouse after some filtering in RDMS like mysql and
> > will be processed there.
> >
> > Which datawarehouse is suitable for approach 1 and 2 and why?.
> >
> > Thanks
> > Shushant
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message