hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek kumar <abhishekiit...@gmail.com>
Subject Re: Hive being slow
Date Sun, 11 Jan 2015 00:54:46 GMT
First I tried running the query: select * from table1 where id = 'value';
It was very fast, as expected since Hbase replied the results very fast. In
this case, I observed no map/reduce task getting spawned.

Now, for the query, select * from table1 where id > 'zzz', I expected the
filter push down to happen (at least the 0.14 code says). And since, there
were no results found, so Hbase will again reply very fast and thus hive
should output the query's result very fast. But, this is not happening, and
from the logs of datanode, it looks like a lot of reads are happening
(close to full table scan of 10GBs of data). I expected the response time
to be very close to the above query's time.

​I will check about the number of task getting launched.​

​My questions are:
* Why there was no any filter pushdown (id > 'zzz') happening​ for this
very simple query.
* Since this query can only be resolved from HBase, will Hive launch map
tasks (last time, I guess I observed no map task getting launched)

--
Abhishek

On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan <hashutosh@apache.org>
wrote:

> Hi Abhishek,
>
> How are you determining its resulting in full table scan? One way to
> ascertain that filter got pushed down is to see how many tasks were
> launched for your query, with and without filter. One would expect lower #
> of splits (and thus tasks) for query having filter.
>
> Thanks,
> Ashutosh
>
> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar <abhishekiitg10@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I am
>> facing issues in terms of slowness when querying over Hbase. My query looks
>> like following:
>>
>> select * from table1 where id > 'zzzz';  (id is the row-key)
>>
>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner as
>> 'startKey'. Now given there are no such rows-keys (id) which satisfies this
>> criteria, this query should be extremely fast. But hive is taking a lot of
>> time, looks like full hbase table scan.
>> Can someone let me know where am I wrong in understanding the whole thing?
>>
>> --
>> Abhishek
>>
>
>

Mime
View raw message