hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Is YSmart integrated into Hive on tez ?
Date Wed, 02 Sep 2015 02:28:43 GMT
+ dev mail list

The original correlation optimization might be designed for mr engine. But
similar optimization could be applied for tez too.  Is there any existing
jira to track that ?



On Tue, Sep 1, 2015 at 1:58 PM, Jeff Zhang <zjffdu@gmail.com> wrote:

> Hi Pengcheng,
>
> Is there reason why the correlation optimization disabled in tez ?
>
> And even when I change the code to enable the correlation optimization in
> tez. I still get the same query plan.
>
> >>> Vertex dependency in root stage
> >>> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
> >>> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>
> On Tue, Sep 1, 2015 at 1:14 AM, Pengcheng Xiong <pxiong@apache.org> wrote:
>
>> Hi Jeff,
>>
>>      From code base point of view,  YSmart is integrated into Hive on Tez
>> because it is one of the optimization of the current Hive. However, from
>> the execution point of view, it is now disabled when Hive is running on
>> Tez. You may take look at the source code of Hive
>>
>> Optimizer.java, L175-180:
>> {code}
>>
>> if(HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEOPTCORRELATION) &&
>>
>>         !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVEGROUPBYSKEW)
>> &&
>>
>>         !HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.
>> HIVE_OPTIMIZE_SKEWJOIN_COMPILETIME) &&
>>
>>         !isTezExecEngine) {
>>
>>       transformations.add(new CorrelationOptimizer());
>>
>>     }
>> {code}
>>
>> Hope it helps.
>>
>> Best
>> Pengcheng Xiong
>>
>>
>> On Mon, Aug 31, 2015 at 12:56 AM, Jeff Zhang <zjffdu@gmail.com> wrote:
>>
>>> The reason why I ask this question is that when I execute the following
>>> sql, it will generated a query plan with 4 vertices. But as my
>>> understanding if YSmart is integrated into hive, it should only take 3
>>> vertices since the join key and group by key are the same. Anybody know
>>> this ? Thanks
>>>
>>>
>>> >> insert overwrite directory '/tmp/jzhang/1' select o.o_orderkey as
>>> orderkey,count(1)  from lineitem l >> join orders o on
>>> l.l_orderkey=o.o_orderkey group by o.o_orderkey;
>>>
>>> *YSmart Hive Jira*
>>>
>>> https://issues.apache.org/jira/browse/HIVE-2206
>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best Regards

Jeff Zhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message