From issues-return-129501-archive-asf-public=cust-asf.ponee.io@hive.apache.org  Fri Jul 20 18:23:06 2018
Return-Path: <issues-return-129501-archive-asf-public=cust-asf.ponee.io@hive.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id E507018067A
	for <archive-asf-public@cust-asf.ponee.io>; Fri, 20 Jul 2018 18:23:05 +0200 (CEST)
Received: (qmail 7515 invoked by uid 500); 20 Jul 2018 16:23:05 -0000
Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@hive.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@hive.apache.org>
List-Post: <mailto:issues@hive.apache.org>
List-Id: <issues.hive.apache.org>
Reply-To: dev@hive.apache.org
Delivered-To: mailing list issues@hive.apache.org
Received: (qmail 7488 invoked by uid 99); 20 Jul 2018 16:23:05 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jul 2018 16:23:05 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id D2DEDC005C
	for <issues@hive.apache.org>; Fri, 20 Jul 2018 16:23:03 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -109.501
X-Spam-Level:
X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8,
	RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5,
	USER_IN_WHITELIST=-100] autolearn=disabled
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024)
	with ESMTP id 19wcETbpIA_g for <issues@hive.apache.org>;
	Fri, 20 Jul 2018 16:23:01 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id BCC605F3EF
	for <issues@hive.apache.org>; Fri, 20 Jul 2018 16:23:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id D3937E12FE
	for <issues@hive.apache.org>; Fri, 20 Jul 2018 16:23:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4F28D27143
	for <issues@hive.apache.org>; Fri, 20 Jul 2018 16:23:00 +0000 (UTC)
Date: Fri, 20 Jul 2018 16:23:00 +0000 (UTC)
From: "Sahil Takiar (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.13072207.1494881175000.48568.1532103780322@Atlassian.JIRA>
In-Reply-To: <JIRA.13072207.1494881175000@Atlassian.JIRA>
References: <JIRA.13072207.1494881175000@Atlassian.JIRA> <JIRA.13072207.1494881175658@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HIVE-16668) Hive on Spark generates incorrect
 plan and result with window function and lateral view
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/HIVE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550943#comment-16550943 ] 

Sahil Takiar commented on HIVE-16668:
-------------------------------------

[~csun] do you know if this is still an issue? Is there anything else that needs to be done for this JIRA (besides probably a rebase) before merging it?

> Hive on Spark generates incorrect plan and result with window function and lateral view
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-16668
>                 URL: https://issues.apache.org/jira/browse/HIVE-16668
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>         Attachments: HIVE-16668.1.patch, HIVE-16668.2.patch, HIVE-16668.3.patch
>
>
> To reproduce:
> {code}
> create table t1 (a string);
> create table t2 (a array<string>);
> create table dummy (a string);
> insert into table dummy values ("a");
> insert into t1 values ("1"), ("2");
> insert into t2 select array("1", "2", "3", "4") from dummy;
> set hive.auto.convert.join.noconditionaltask.size=3;
> explain
> with tt1 as (
>   select a as id, count(*) over () as count
>   from t1
> ),
> tt2 as (
>   select id
>   from t2
>   lateral view outer explode(a) a_tbl as id
> )
> select tt1.count
> from tt1 join tt2 on tt1.id = tt2.id;
> {code}
> For Hive on Spark, the plan is:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
>     Spark
>       Edges:
>         Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 3), Map 1 (PARTITION-LEVEL SORT, 3)
>       DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:9
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: t1
>                   Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                   Reduce Output Operator
>                     key expressions: 0 (type: int)
>                     sort order: +
>                     Map-reduce partition columns: 0 (type: int)
>                     Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                     value expressions: a (type: string)
>         Reducer 2
>             Local Work:
>               Map Reduce Local Work
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: VALUE._col0 (type: string)
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                 PTF Operator
>                   Function definitions:
>                       Input definition
>                         input alias: ptf_0
>                         output shape: _col0: string
>                         type: WINDOWING
>                       Windowing table definition
>                         input alias: ptf_1
>                         name: windowingtablefunction
>                         order by: 0 ASC NULLS FIRST
>                         partition by: 0
>                         raw input shape:
>                         window functions:
>                             window function definition
>                               alias: count_window_0
>                               name: count
>                               window function: GenericUDAFCountEvaluator
>                               window frame: PRECEDING(MAX)~FOLLOWING(MAX)
>                               isStar: true
>                   Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: _col0 is not null (type: boolean)
>                     Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: _col0 (type: string), count_window_0 (type: bigint)
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                       Spark HashTable Sink Operator
>                         keys:
>                           0 _col0 (type: string)
>                           1 _col0 (type: string)
>                       Reduce Output Operator
>                         key expressions: _col0 (type: string)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: string)
>                         Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                         value expressions: _col1 (type: bigint)
>   Stage: Stage-1
>     Spark
>       DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:8
>       Vertices:
>         Map 3
>             Map Operator Tree:
>                 TableScan
>                   alias: t2
>                   Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
>                   Lateral View Forward
>                     Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
>                     Select Operator
>                       Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
>                       Lateral View Join Operator
>                         outputColumnNames: _col4
>                         Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
>                         Select Operator
>                           expressions: _col4 (type: string)
>                           outputColumnNames: _col0
>                           Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
>                           Map Join Operator
>                             condition map:
>                                  Inner Join 0 to 1
>                             keys:
>                               0 _col0 (type: string)
>                               1 _col0 (type: string)
>                             outputColumnNames: _col1
>                             input vertices:
>                               0 Reducer 2
>                             Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                             Select Operator
>                               expressions: _col1 (type: bigint)
>                               outputColumnNames: _col0
>                               Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                               File Output Operator
>                                 compressed: false
>                                 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                                 table:
>                                     input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                                     output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                                     serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                     Select Operator
>                       expressions: a (type: array<string>)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
>                       UDTF Operator
>                         Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
>                         function name: explode
>                         outer lateral view: true
>                         Filter Operator
>                           predicate: col is not null (type: boolean)
>                           Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE
>                           Lateral View Join Operator
>                             outputColumnNames: _col4
>                             Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
>                             Select Operator
>                               expressions: _col4 (type: string)
>                               outputColumnNames: _col0
>                               Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
>                               Map Join Operator
>                                 condition map:
>                                      Inner Join 0 to 1
>                                 keys:
>                                   0 _col0 (type: string)
>                                   1 _col0 (type: string)
>                                 outputColumnNames: _col1
>                                 input vertices:
>                                   0 Reducer 2
>                                 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                                 Select Operator
>                                   expressions: _col1 (type: bigint)
>                                   outputColumnNames: _col0
>                                   Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                                   File Output Operator
>                                     compressed: false
>                                     Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
>                                     table:
>                                         input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                                         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>             Local Work:
>               Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> Note that there're two {{Map 1}} s as inputs for {{Reduce 2}}.
> The result for this query is:
> {code}
> 4
> 4
> 4
> 4
> {code} 
> for Hive on Spark, which is not correct.


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)