hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek kumar <abhishekiit...@gmail.com>
Subject Re: Hive being slow
Date Thu, 15 Jan 2015 17:18:33 GMT
0.14.0

--
Abhishek

On Thu, Jan 15, 2015 at 10:43 PM, Ashutosh Chauhan <hashutosh@apache.org>
wrote:

> which hive version you are using ?
>
> On Thu, Jan 15, 2015 at 12:44 AM, Abhishek kumar <abhishekiitg10@gmail.com
> > wrote:
>
>> Hi,
>>
>> Thanks for the reply.
>>
>> I tried that, but no luck. The map-reduce seems to be stuck (taking a lot
>> of time, just for 65 lakhs of Hbase rows). I am attaching the log file (or
>> http://pastebin.com/BUYDUiEu)
>>
>> My only question is why the filter push-down for row-key (*startKey* and
>> *stopKey* for the *Scanner*) is not happening to Hbase. If the push-down
>> happens, then Hbase will resolve this Scanner very fast and no matter MR
>> job runs or not, the query resolution will be very fast.
>>
>> --
>> Abhishek
>>
>> On Thu, Jan 15, 2015 at 1:59 AM, Ashutosh Chauhan <hashutosh@apache.org>
>> wrote:
>>
>>> Can you run your query with following config:
>>>
>>> hive> set hive.fetch.task.conversion=none;
>>>
>>> and run your two queries with this. Lets see if this makes a difference.
>>> My expectation is this will result in MR job getting launched and thus
>>> runtimes might be different.
>>>
>>> On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar <
>>> abhishekiitg10@gmail.com> wrote:
>>>
>>>> First I tried running the query: select * from table1 where id =
>>>> 'value';
>>>> It was very fast, as expected since Hbase replied the results very
>>>> fast. In this case, I observed no map/reduce task getting spawned.
>>>>
>>>> Now, for the query, select * from table1 where id > 'zzz', I expected
>>>> the filter push down to happen (at least the 0.14 code says). And since,
>>>> there were no results found, so Hbase will again reply very fast and thus
>>>> hive should output the query's result very fast. But, this is not
>>>> happening, and from the logs of datanode, it looks like a lot of reads are
>>>> happening (close to full table scan of 10GBs of data). I expected the
>>>> response time to be very close to the above query's time.
>>>>
>>>> I will check about the number of task getting launched.
>>>>
>>>> My questions are:
>>>> * Why there was no any filter pushdown (id > 'zzz') happening for this
>>>> very simple query.
>>>> * Since this query can only be resolved from HBase, will Hive launch
>>>> map tasks (last time, I guess I observed no map task getting launched)
>>>>
>>>> --
>>>> Abhishek
>>>>
>>>> On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan <hashutosh@apache.org
>>>> > wrote:
>>>>
>>>>> Hi Abhishek,
>>>>>
>>>>> How are you determining its resulting in full table scan? One way to
>>>>> ascertain that filter got pushed down is to see how many tasks were
>>>>> launched for your query, with and without filter. One would expect lower
#
>>>>> of splits (and thus tasks) for query having filter.
>>>>>
>>>>> Thanks,
>>>>> Ashutosh
>>>>>
>>>>> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar <
>>>>> abhishekiitg10@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am using hive 0.14 which runs over hbase (having ~10 GB of data).
I
>>>>>> am facing issues in terms of slowness when querying over Hbase. My
query
>>>>>> looks like following:
>>>>>>
>>>>>> select * from table1 where id > 'zzzz';  (id is the row-key)
>>>>>>
>>>>>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner
>>>>>> as 'startKey'. Now given there are no such rows-keys (id) which satisfies
>>>>>> this criteria, this query should be extremely fast. But hive is taking
a
>>>>>> lot of time, looks like full hbase table scan.
>>>>>> Can someone let me know where am I wrong in understanding the whole
>>>>>> thing?
>>>>>>
>>>>>> --
>>>>>> Abhishek
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message