hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-16600) Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in multi_insert cases
Date Mon, 12 Jun 2017 08:10:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046307#comment-16046307
] 

liyunzhang_intel edited comment on HIVE-16600 at 6/12/17 8:09 AM:
------------------------------------------------------------------

[~lirui]:  changes in HIVE-16600.11.patch, please help review.
1. check all branches to verify whether there is LIMIT after RS, once condition matches, return
true( this is an order by limit case), If TermialOperator is encountered, do not search its
children any more.
2. add a new unit test like "select * from A order by columnB limit N"
3. remove case like following as after HIVE-6348, the physical plan of following query will
be:
query:
 {code}
FROM (select key,value from src order by key) a
INSERT OVERWRITE TABLE e1
   SELECT key
INSERT OVERWRITE TABLE e2
   SELECT key limit xx;
{code}

physical plan:
{code}
TS[0]-SEL[1]-FS[3]
       -LIM[5]-RS[6]-SEL[7]-LIM[8]-FS[9]
{code}
we judge above case in the same way we do in "select * from A order by columnB limit N)"



was (Author: kellyzly):
[~lirui]:  changes in HIVE-16600.11.patch
1. check all branches to verify whether there is LIMIT after RS, once condition matches, return
true( this is an order by limit case), If TermialOperator is encountered, do not search its
children any more.
2. add a new unit test like "select * from A order by columnB limit N"
3. remove case like following as after HIVE-6348, the physical plan of following query will
be:
query:
 {code}
FROM (select key,value from src order by key) a
INSERT OVERWRITE TABLE e1
   SELECT key
INSERT OVERWRITE TABLE e2
   SELECT key limit xx;
{code}

physical plan:
{code}
TS[0]-SEL[1]-FS[3]
       -LIM[5]-RS[6]-SEL[7]-LIM[8]-FS[9]
{code}
we judge above case in the same way we do in "select * from A order by columnB limit N)"


> Refactor SetSparkReducerParallelism#needSetParallelism to enable parallel order by in
multi_insert cases
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16600
>                 URL: https://issues.apache.org/jira/browse/HIVE-16600
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-16600.10.patch, HIVE-16600.11.patch, HIVE-16600.1.patch, HIVE-16600.2.patch,
HIVE-16600.3.patch, HIVE-16600.4.patch, HIVE-16600.5.patch, HIVE-16600.6.patch, HIVE-16600.7.patch,
HIVE-16600.8.patch, HIVE-16600.9.patch, mr.explain, mr.explain.log.HIVE-16600, Node.java,
TestSetSparkReduceParallelism_MultiInsertCase.java
>
>
> multi_insert_gby.case.q
> {code}
> set hive.exec.reducers.bytes.per.reducer=256;
> set hive.optimize.sampling.orderby=true;
> drop table if exists e1;
> drop table if exists e2;
> create table e1 (key string, value string);
> create table e2 (key string);
> FROM (select key, cast(key as double) as keyD, value from src order by key) a
> INSERT OVERWRITE TABLE e1
>     SELECT key, value
> INSERT OVERWRITE TABLE e2
>     SELECT key;
> select * from e1;
> select * from e2;
> {code} 
> the parallelism of Sort is 1 even we enable parallel order by("hive.optimize.sampling.orderby"
is set as "true").  This is not reasonable because the parallelism  should be calcuated by
 [Utilities.estimateReducers|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L170]
> this is because SetSparkReducerParallelism#needSetParallelism returns false when [children
size of RS|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L207]
is greater than 1.
> in this case, the children size of {{RS[2]}} is two.
> the logical plan of the case
> {code}
>    TS[0]-SEL[1]-RS[2]-SEL[3]-SEL[4]-FS[5]
>                             -SEL[6]-FS[7]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message