hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Performance between Hive queries vs. Hive over HBase queries
Date Wed, 09 Mar 2011 21:50:24 GMT
On Wed, Mar 9, 2011 at 4:31 PM, John Sichi <jsichi@fb.com> wrote:
> Factor of 5 closely matches the results I got when I was testing.
>
> JVS
>
> On Mar 9, 2011, at 1:23 PM, Otis Gospodnetic wrote:
>
>> Hi,
>>
>> Biju's example shows a factor of 5 decrease in performance when Hive points to
>> HBase tables.
>>
>> Does anyone know how much this factor varies?  Is if often closer to 1 or is is
>> more often close to 10?
>> Just trying to get a better feel for this...
>>
>> Thanks,
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> ----- Original Message ----
>>> From: John Sichi <jsichi@fb.com>
>>> To: "<user@hive.apache.org>" <user@hive.apache.org>
>>> Sent: Tue, March 8, 2011 1:05:34 AM
>>> Subject: Re: Performance between Hive queries vs. Hive over HBase queries
>>>
>>> Yes.
>>>
>>> JVS
>>>
>>> On Mar 7, 2011, at 9:59 PM, Biju Kaimal  wrote:
>>>
>>>> Hi,
>>>>
>>>> I loaded a data set which has 1 million  rows into both Hive and HBase
>>> tables. For the HBase table, I created a  corresponding Hive table so that the
>>> data in HBase can be queried from Hive QL.  Both tables have a key column and
a
>>> value column
>>>>
>>>> For the same  query (select value, count(*) from table group by value),
the
>>> Hive only query  runs much faster (~ 30 seconds) as compared to Hive over HBase
>>> (~ 150  seconds).
>>>>
>>>> Is this expected?
>>>>
>>>> Regards,
>>>> Biju
>>>
>>>
>
>
There is going to be overhead. Data has to move
HDFS->RegionServer->TaskTracker. Another factor would be how many
column families are being spanned in your table search.

Mime
View raw message