hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sichi <jsi...@fb.com>
Subject Re: Performance between Hive queries vs. Hive over HBase queries
Date Tue, 08 Mar 2011 06:17:51 GMT
For native tables, Hive reads rows directly from HDFS.

For HBase tables, it has to go through the HBase region servers, which reconstruct rows from
column families (combining cache + HDFS).

HBase makes it possible to keep your table up to date in real time, but you have to pay an
overhead cost at query time.

On the other hand, with native Hive tables, there's latency in loading new batches of data.

JVS

On Mar 7, 2011, at 10:13 PM, Biju Kaimal wrote:

> Hi,
> 
> Could you please explain the reason for the behavior? 
> 
> Regards,
> Biju
> 
> On Tue, Mar 8, 2011 at 11:35 AM, John Sichi <jsichi@fb.com> wrote:
> Yes.
> 
> JVS
> 
> On Mar 7, 2011, at 9:59 PM, Biju Kaimal wrote:
> 
> > Hi,
> >
> > I loaded a data set which has 1 million rows into both Hive and HBase tables. For
the HBase table, I created a corresponding Hive table so that the data in HBase can be queried
from Hive QL. Both tables have a key column and a value column
> >
> > For the same query (select value, count(*) from table group by value), the Hive
only query runs much faster (~ 30 seconds) as compared to Hive over HBase (~ 150 seconds).
> >
> > Is this expected?
> >
> > Regards,
> > Biju
> 
> 


Mime
View raw message