hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek kumar <>
Subject Re: Hive being slow
Date Thu, 15 Jan 2015 08:44:45 GMT

Thanks for the reply.

I tried that, but no luck. The map-reduce seems to be stuck (taking a lot
of time, just for 65 lakhs of Hbase rows). I am attaching the log file (or

My only question is why the filter push-down for row-key (*startKey* and
*stopKey* for the *Scanner*) is not happening to Hbase. If the push-down
happens, then Hbase will resolve this Scanner very fast and no matter MR
job runs or not, the query resolution will be very fast.


On Thu, Jan 15, 2015 at 1:59 AM, Ashutosh Chauhan <>

> Can you run your query with following config:
> hive> set hive.fetch.task.conversion=none;
> and run your two queries with this. Lets see if this makes a difference.
> My expectation is this will result in MR job getting launched and thus
> runtimes might be different.
> On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar <>
> wrote:
>> First I tried running the query: select * from table1 where id = 'value';
>> It was very fast, as expected since Hbase replied the results very fast.
>> In this case, I observed no map/reduce task getting spawned.
>> Now, for the query, select * from table1 where id > 'zzz', I expected
>> the filter push down to happen (at least the 0.14 code says). And since,
>> there were no results found, so Hbase will again reply very fast and thus
>> hive should output the query's result very fast. But, this is not
>> happening, and from the logs of datanode, it looks like a lot of reads are
>> happening (close to full table scan of 10GBs of data). I expected the
>> response time to be very close to the above query's time.
>> I will check about the number of task getting launched.
>> My questions are:
>> * Why there was no any filter pushdown (id > 'zzz') happening for this
>> very simple query.
>> * Since this query can only be resolved from HBase, will Hive launch map
>> tasks (last time, I guess I observed no map task getting launched)
>> --
>> Abhishek
>> On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan <>
>> wrote:
>>> Hi Abhishek,
>>> How are you determining its resulting in full table scan? One way to
>>> ascertain that filter got pushed down is to see how many tasks were
>>> launched for your query, with and without filter. One would expect lower #
>>> of splits (and thus tasks) for query having filter.
>>> Thanks,
>>> Ashutosh
>>> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar <
>>>> wrote:
>>>> Hi,
>>>> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I
>>>> am facing issues in terms of slowness when querying over Hbase. My query
>>>> looks like following:
>>>> select * from table1 where id > 'zzzz';  (id is the row-key)
>>>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner as
>>>> 'startKey'. Now given there are no such rows-keys (id) which satisfies this
>>>> criteria, this query should be extremely fast. But hive is taking a lot of
>>>> time, looks like full hbase table scan.
>>>> Can someone let me know where am I wrong in understanding the whole
>>>> thing?
>>>> --
>>>> Abhishek

View raw message