Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1C202200D36 for ; Mon, 23 Oct 2017 07:25:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1AA7E160BF0; Mon, 23 Oct 2017 05:25:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DDD4E160BD7 for ; Mon, 23 Oct 2017 07:25:04 +0200 (CEST) Received: (qmail 77810 invoked by uid 500); 23 Oct 2017 05:25:04 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 77801 invoked by uid 99); 23 Oct 2017 05:25:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Oct 2017 05:25:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2B971C49F6 for ; Mon, 23 Oct 2017 05:25:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id i5hfHyndQC2u for ; Mon, 23 Oct 2017 05:25:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 820CA5FE4F for ; Mon, 23 Oct 2017 05:25:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6A6ABE045B for ; Mon, 23 Oct 2017 05:25:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1BB0621EE5 for ; Mon, 23 Oct 2017 05:25:00 +0000 (UTC) Date: Mon, 23 Oct 2017 05:25:00 +0000 (UTC) From: "liyunzhang_intel (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 23 Oct 2017 05:25:06 -0000 [ https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214644#comment-16214644 ] liyunzhang_intel edited comment on HIVE-17193 at 10/23/17 5:24 AM: ------------------------------------------------------------------- I can reproduce after disabling cbo {code} set hive.explain.user=false; set hive.spark.dynamic.partition.pruning=true; set hive.tez.dynamic.partition.pruning=true; set hive.auto.convert.join=false; set hive.cbo.enable=false; explain select * from (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a join (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b on a.key=b.key; {code} the explain {code} STAGE DEPENDENCIES: Stage-2 is a root stage Stage-1 depends on stages: Stage-2 Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Spark DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:2 Vertices: Map 8 Map Operator Tree: TableScan alias: src Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: string) outputColumnNames: _col0 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Spark Partition Pruning Sink Operator Target column: ds (string) partition key expr: ds Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE target work: Map 1 Stage: Stage-1 Spark Edges: Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1) Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 (PARTITION-LEVEL SORT, 1) Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL SORT, 1) DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:1 Vertices: Map 1 Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: ds (type: string) sort order: + Map-reduce partition columns: ds (type: string) Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE value expressions: key (type: string) Map 4 Map Operator Tree: TableScan alias: src Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: key (type: string) sort order: + Map-reduce partition columns: key (type: string) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Map 7 Map Operator Tree: TableScan alias: src Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: value is not null (type: boolean) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: value (type: string) sort order: + Map-reduce partition columns: value (type: string) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Reducer 2 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 ds (type: string) 1 key (type: string) outputColumnNames: _col0, _col2 Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col2 (type: string), _col0 (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col1 (type: string) sort order: + Map-reduce partition columns: _col1 (type: string) Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: string) Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col1 (type: string) 1 _col1 (type: string) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Reducer 6 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 ds (type: string) 1 value (type: string) outputColumnNames: _col0, _col2 Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col2 (type: string), _col0 (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col1 (type: string) sort order: + Map-reduce partition columns: _col1 (type: string) Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: string) Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} There is only 1 Map about srcpart. The reason why the maps about srcpart can not be merged when enabling cbo is because the RS in Maps are considered different while they are considered same when disabling cbo(see attached [picture|https://issues.apache.org/jira/secure/attachment/12893484/17193_compare_RS_in_Map_5_1.PNG]) was (Author: kellyzly): I can reproduce after disabling cbo {code} set hive.explain.user=false; set hive.spark.dynamic.partition.pruning=true; set hive.tez.dynamic.partition.pruning=true; set hive.auto.convert.join=false; set hive.cbo.enable=false; explain select * from (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a join (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b on a.key=b.key; {code} the explain {code} STAGE DEPENDENCIES: Stage-2 is a root stage Stage-1 depends on stages: Stage-2 Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Spark DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:2 Vertices: Map 8 Map Operator Tree: TableScan alias: src Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: string) outputColumnNames: _col0 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Spark Partition Pruning Sink Operator Target column: ds (string) partition key expr: ds Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE target work: Map 1 Stage: Stage-1 Spark Edges: Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL SORT, 1) Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6 (PARTITION-LEVEL SORT, 1) Reducer 6 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL SORT, 1) DagName: root_20171023004308_4b3c304e-3deb-4193-846d-12cf9e6a50ab:1 Vertices: Map 1 Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: ds (type: string) sort order: + Map-reduce partition columns: ds (type: string) Statistics: Num rows: 232 Data size: 23248 Basic stats: COMPLETE Column stats: NONE value expressions: key (type: string) Map 4 Map Operator Tree: TableScan alias: src Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: key (type: string) sort order: + Map-reduce partition columns: key (type: string) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Map 7 Map Operator Tree: TableScan alias: src Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: value is not null (type: boolean) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: value (type: string) sort order: + Map-reduce partition columns: value (type: string) Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Reducer 2 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 ds (type: string) 1 key (type: string) outputColumnNames: _col0, _col2 Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col2 (type: string), _col0 (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col1 (type: string) sort order: + Map-reduce partition columns: _col1 (type: string) Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: string) Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col1 (type: string) 1 _col1 (type: string) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 280 Data size: 28129 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Reducer 6 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 ds (type: string) 1 value (type: string) outputColumnNames: _col0, _col2 Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col2 (type: string), _col0 (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col1 (type: string) sort order: + Map-reduce partition columns: _col1 (type: string) Statistics: Num rows: 255 Data size: 25572 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: string) Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} There is only 1 Map about srcpart. The reason why the maps about srcpart can not be merged when enabling cbo is because the RS in Maps are considered different while they are considered same when disabling cbo(see attached picture) > HoS: don't combine map works that are targets of different DPPs > --------------------------------------------------------------- > > Key: HIVE-17193 > URL: https://issues.apache.org/jira/browse/HIVE-17193 > Project: Hive > Issue Type: Bug > Reporter: Rui Li > Assignee: Rui Li > > Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger the issue: > {code} > explain > select * from > (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a > join > (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b > on a.key=b.key; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)