hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Hive being slow
Date Wed, 14 Jan 2015 20:29:07 GMT
Can you run your query with following config:

hive> set hive.fetch.task.conversion=none;

and run your two queries with this. Lets see if this makes a difference. My
expectation is this will result in MR job getting launched and thus
runtimes might be different.

On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar <abhishekiitg10@gmail.com>
wrote:

> First I tried running the query: select * from table1 where id = 'value';
> It was very fast, as expected since Hbase replied the results very fast.
> In this case, I observed no map/reduce task getting spawned.
>
> Now, for the query, select * from table1 where id > 'zzz', I expected the
> filter push down to happen (at least the 0.14 code says). And since, there
> were no results found, so Hbase will again reply very fast and thus hive
> should output the query's result very fast. But, this is not happening, and
> from the logs of datanode, it looks like a lot of reads are happening
> (close to full table scan of 10GBs of data). I expected the response time
> to be very close to the above query's time.
>
> I will check about the number of task getting launched.
>
> My questions are:
> * Why there was no any filter pushdown (id > 'zzz') happening for this
> very simple query.
> * Since this query can only be resolved from HBase, will Hive launch map
> tasks (last time, I guess I observed no map task getting launched)
>
> --
> Abhishek
>
> On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan <hashutosh@apache.org>
> wrote:
>
>> Hi Abhishek,
>>
>> How are you determining its resulting in full table scan? One way to
>> ascertain that filter got pushed down is to see how many tasks were
>> launched for your query, with and without filter. One would expect lower #
>> of splits (and thus tasks) for query having filter.
>>
>> Thanks,
>> Ashutosh
>>
>> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar <abhishekiitg10@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I am
>>> facing issues in terms of slowness when querying over Hbase. My query looks
>>> like following:
>>>
>>> select * from table1 where id > 'zzzz';  (id is the row-key)
>>>
>>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner as
>>> 'startKey'. Now given there are no such rows-keys (id) which satisfies this
>>> criteria, this query should be extremely fast. But hive is taking a lot of
>>> time, looks like full hbase table scan.
>>> Can someone let me know where am I wrong in understanding the whole
>>> thing?
>>>
>>> --
>>> Abhishek
>>>
>>
>>
>

Mime
View raw message