Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BF51C200CF1 for ; Mon, 28 Aug 2017 20:54:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BDAB216569D; Mon, 28 Aug 2017 18:54:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DD47116569B for ; Mon, 28 Aug 2017 20:54:09 +0200 (CEST) Received: (qmail 47634 invoked by uid 500); 28 Aug 2017 18:54:09 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 47625 invoked by uid 99); 28 Aug 2017 18:54:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Aug 2017 18:54:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 78E3BCAB60 for ; Mon, 28 Aug 2017 18:54:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id cWwcGh9nN0fW for ; Mon, 28 Aug 2017 18:54:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 9E4DC5FBE2 for ; Mon, 28 Aug 2017 18:54:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 95320E01A8 for ; Mon, 28 Aug 2017 18:54:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 255A123F15 for ; Mon, 28 Aug 2017 18:54:00 +0000 (UTC) Date: Mon, 28 Aug 2017 18:54:00 +0000 (UTC) From: "Janaki Lahorani (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-17396) Support DPP with map joins where the source and target belong in the same stage MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 28 Aug 2017 18:54:10 -0000 [ https://issues.apache.org/jira/browse/HIVE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144195#comment-16144195 ] Janaki Lahorani commented on HIVE-17396: ---------------------------------------- HIVE.17225.1 has a potential fix. This will be further enhanced in this JIRA. > Support DPP with map joins where the source and target belong in the same stage > ------------------------------------------------------------------------------- > > Key: HIVE-17396 > URL: https://issues.apache.org/jira/browse/HIVE-17396 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Janaki Lahorani > Assignee: Janaki Lahorani > > When the target of a partition pruning sink operator is in not the same as the target of hash table sink operator, both source and target gets scheduled within the same spark job, and that can result in File Not Found Exception. HIVE-17225 has a fix to disable DPP in that scenario. This JIRA is to support DPP for such cases. > Test Case: > SET hive.spark.dynamic.partition.pruning=true; > SET hive.auto.convert.join=true; > SET hive.strict.checks.cartesian.product=false; > CREATE TABLE part_table1 (col int) PARTITIONED BY (part1_col int); > CREATE TABLE part_table2 (col int) PARTITIONED BY (part2_col int); > CREATE TABLE reg_table (col int); > ALTER TABLE part_table1 ADD PARTITION (part1_col = 1); > ALTER TABLE part_table2 ADD PARTITION (part2_col = 1); > ALTER TABLE part_table2 ADD PARTITION (part2_col = 2); > INSERT INTO TABLE part_table1 PARTITION (part1_col = 1) VALUES (1); > INSERT INTO TABLE part_table2 PARTITION (part2_col = 1) VALUES (1); > INSERT INTO TABLE part_table2 PARTITION (part2_col = 2) VALUES (2); > INSERT INTO table reg_table VALUES (1), (2), (3), (4), (5), (6); > EXPLAIN SELECT * > FROM part_table1 pt1, > part_table2 pt2, > reg_table rt > WHERE rt.col = pt1.part1_col > AND pt2.part2_col = pt1.part1_col; > Plan: > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > #### A masked pattern was here #### > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: pt1 > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part1_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > 2 _col0 (type: int) > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > Target column: part2_col (int) > partition key expr: part2_col > Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE > target work: Map 2 > Local Work: > Map Reduce Local Work > Map 2 > Map Operator Tree: > TableScan > alias: pt2 > Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part2_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > 2 _col0 (type: int) > Local Work: > Map Reduce Local Work > Stage: Stage-1 > Spark > #### A masked pattern was here #### > Vertices: > Map 3 > Map Operator Tree: > TableScan > alias: rt > Statistics: Num rows: 6 Data size: 6 Basic stats: COMPLETE Column stats: NONE > Filter Operator > predicate: col is not null (type: boolean) > Statistics: Num rows: 6 Data size: 6 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 6 Data size: 6 Basic stats: COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > Inner Join 0 to 2 > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > 2 _col0 (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, _col4 > input vertices: > 0 Map 1 > 1 Map 2 > Statistics: Num rows: 13 Data size: 13 Basic stats: COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 13 Data size: 13 Basic stats: COMPLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.SequenceFileInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Local Work: > Map Reduce Local Work > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink -- This message was sent by Atlassian JIRA (v6.4.14#64029)