hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navis류승우 <>
Subject Re: non map-reduce for simple queries
Date Sun, 29 Jul 2012 01:17:12 GMT
I was thinking of timeout for fetching, 2000msec for example. How about

2012년 7월 29일 일요일에 Edward Capriolo<>님이 작성:
> If where condition is too complex , selecting specific columns seems
> enough and useful.
> On Saturday, July 28, 2012, Namit Jain <> wrote:
>> Currently, hive does not launch map-reduce jobs for the following
>> select * from <T> where <condition on partition columns> (limit <n>)?
>> This behavior is not configurable, and cannot be altered.
>> HIVE-2925 wants to extend this behavior. The goal is not to spawn
> map-reduce jobs for the following queries:
>> Select <expr> from <T> where <any condition> (limit <n>)?
>> It is currently controlled by one parameter:
> hive.aggressive.fetch.task.conversion, based on which it is decided,
> whether to spawn
>> map-reduce jobs or not for the queries of the above type. Note that this
> can be beneficial for certain types of queries, since it is
>> avoiding the expensive step of spawning map-reduce. However, it can be
> pretty expensive for certain types of queries: selecting
>> a very large number of rows, the query having a very selective filter
> (which is satisfied by a very number of rows, and therefore involves
>> scanning a very large table) etc. The user does not have any control on
> this. Note that it cannot be done by hooks, since the pre-semantic
>> hooks does not have enough information: type of the query, inputs etc.
> and it is too late to do anything in the post-semantic hook (the
>> query plan has already been altered).
>> I would like to propose the following configuration parameters to control
> this behavior.
>> hive.fetch.task.conversion: true, false, auto
>> If the value is true, then all queries with only selects and filters will
> be converted
>> If the value is false, then no query will be converted
>> If the value is auto (which should be the default behavior), there should
> be additional parameters to control the semantics.
>>               ---> integer value X1
>>      ---> integer value X2
>> If either the query has a limit lower than X1, or the input size is
> smaller than X2, the queries containing only filters and selects will be
> converted to not use
>> map-reudce jobs.
>> Comments…
>> -namit

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message