hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16668) Hive on Spark generates incorrect plan and result with window function and lateral view
Date Wed, 17 May 2017 14:44:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chao Sun updated HIVE-16668:
----------------------------
    Attachment: HIVE-16668.2.patch

Attaching patch v2 with a different approach. v1 failed to handle some union + multi-insert
cases.

> Hive on Spark generates incorrect plan and result with window function and lateral view
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-16668
>                 URL: https://issues.apache.org/jira/browse/HIVE-16668
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>         Attachments: HIVE-16668.1.patch, HIVE-16668.2.patch
>
>
> To reproduce:
> {code}
> create table t1 (a string);
> create table t2 (a array<string>);
> create table dummy (a string);
> insert into table dummy values ("a");
> insert into t1 values ("1"), ("2");
> insert into t2 select array("1", "2", "3", "4") from dummy;
> set hive.auto.convert.join.noconditionaltask.size=3;
> explain
> with tt1 as (
>   select a as id, count(*) over () as count
>   from t1
> ),
> tt2 as (
>   select id
>   from t2
>   lateral view outer explode(a) a_tbl as id
> )
> select tt1.count
> from tt1 join tt2 on tt1.id = tt2.id;
> {code}
> For Hive on Spark, the plan is:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
>     Spark
>       Edges:
>         Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 3), Map 1 (PARTITION-LEVEL SORT,
3)
>       DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:9
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: t1
>                   Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats:
NONE
>                   Reduce Output Operator
>                     key expressions: 0 (type: int)
>                     sort order: +
>                     Map-reduce partition columns: 0 (type: int)
>                     Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column
stats: NONE
>                     value expressions: a (type: string)
>         Reducer 2
>             Local Work:
>               Map Reduce Local Work
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: VALUE._col0 (type: string)
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats:
NONE
>                 PTF Operator
>                   Function definitions:
>                       Input definition
>                         input alias: ptf_0
>                         output shape: _col0: string
>                         type: WINDOWING
>                       Windowing table definition
>                         input alias: ptf_1
>                         name: windowingtablefunction
>                         order by: 0 ASC NULLS FIRST
>                         partition by: 0
>                         raw input shape:
>                         window functions:
>                             window function definition
>                               alias: count_window_0
>                               name: count
>                               window function: GenericUDAFCountEvaluator
>                               window frame: PRECEDING(MAX)~FOLLOWING(MAX)
>                               isStar: true
>                   Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats:
NONE
>                   Filter Operator
>                     predicate: _col0 is not null (type: boolean)
>                     Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column
stats: NONE
>                     Select Operator
>                       expressions: _col0 (type: string), count_window_0 (type: bigint)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column
stats: NONE
>                       Spark HashTable Sink Operator
>                         keys:
>                           0 _col0 (type: string)
>                           1 _col0 (type: string)
>                       Reduce Output Operator
>                         key expressions: _col0 (type: string)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: string)
>                         Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column
stats: NONE
>                         value expressions: _col1 (type: bigint)
>   Stage: Stage-1
>     Spark
>       DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:8
>       Vertices:
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: t2
>                   Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column
stats: NONE
>                   Lateral View Forward
>                     Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column
stats: NONE
>                     Select Operator
>                       Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column
stats: NONE
>                       Lateral View Join Operator
>                         outputColumnNames: _col4
>                         Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column
stats: NONE
>                         Select Operator
>                           expressions: _col4 (type: string)
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE
Column stats: NONE
>                           Map Join Operator
>                             condition map:
>                                  Inner Join 0 to 1
>                             keys:
>                               0 _col0 (type: string)
>                               1 _col0 (type: string)
>                             outputColumnNames: _col1
>                             input vertices:
>                               0 Reducer 2
>                             Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE
Column stats: NONE
>                             Select Operator
>                               expressions: _col1 (type: bigint)
>                               outputColumnNames: _col0
>                               Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE
Column stats: NONE
>                               File Output Operator
>                                 compressed: false
>                                 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE
Column stats: NONE
>                                 table:
>                                     input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                                     output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                                     serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                     Select Operator
>                       expressions: a (type: array<string>)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column
stats: NONE
>                       UDTF Operator
>                         Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column
stats: NONE
>                         function name: explode
>                         outer lateral view: true
>                         Filter Operator
>                           predicate: col is not null (type: boolean)
>                           Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
Column stats: NONE
>                           Lateral View Join Operator
>                             outputColumnNames: _col4
>                             Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE
Column stats: NONE
>                             Select Operator
>                               expressions: _col4 (type: string)
>                               outputColumnNames: _col0
>                               Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE
Column stats: NONE
>                               Map Join Operator
>                                 condition map:
>                                      Inner Join 0 to 1
>                                 keys:
>                                   0 _col0 (type: string)
>                                   1 _col0 (type: string)
>                                 outputColumnNames: _col1
>                                 input vertices:
>                                   0 Reducer 2
>                                 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE
Column stats: NONE
>                                 Select Operator
>                                   expressions: _col1 (type: bigint)
>                                   outputColumnNames: _col0
>                                   Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE
Column stats: NONE
>                                   File Output Operator
>                                     compressed: false
>                                     Statistics: Num rows: 2 Data size: 2 Basic stats:
COMPLETE Column stats: NONE
>                                     table:
>                                         input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                                         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>             Local Work:
>               Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> Note that there're two {{Map 1}} s as inputs for {{Reduce 2}}.
> The result for this query is:
> {code}
> 4
> 4
> 4
> 4
> {code} 
> for Hive on Spark, which is not correct.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message