Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C0E48CB24 for ; Thu, 13 Nov 2014 02:52:00 +0000 (UTC) Received: (qmail 85366 invoked by uid 500); 13 Nov 2014 02:48:11 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 81266 invoked by uid 500); 13 Nov 2014 02:48:08 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 74504 invoked by uid 99); 13 Nov 2014 02:29:45 -0000 Received: from Unknown (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Nov 2014 02:29:45 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 5B460113B62; Thu, 13 Nov 2014 02:29:29 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============2044482480282087879==" MIME-Version: 1.0 Subject: Review Request 27955: HIVE-8842 - auto_join2.q produces incorrect tree [Spark Branch] From: "Chao Sun" To: "Szehon Ho" , "Xuefu Zhang" Cc: "hive" , "Chao Sun" Date: Thu, 13 Nov 2014 02:29:29 -0000 Message-ID: <20141113022929.9703.43625@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Chao Sun" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/27955/ X-Sender: "Chao Sun" Reply-To: "Chao Sun" X-ReviewRequest-Repository: hive-git --===============2044482480282087879== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27955/ ----------------------------------------------------------- Review request for hive, Szehon Ho and Xuefu Zhang. Bugs: HIVE-8842 https://issues.apache.org/jira/browse/HIVE-8842 Repository: hive-git Description ------- Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the following: explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key + src2.key = src3.key); Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the following: {noformat} explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key + src2.key = src3.key); {noformat} produces too many stages (six), and too many HashTableSink. {noformat} STAGE DEPENDENCIES: Stage-5 is a root stage Stage-4 depends on stages: Stage-5 Stage-3 depends on stages: Stage-4 Stage-7 is a root stage Stage-6 depends on stages: Stage-7 Stage-0 is a root stage STAGE PLANS: Stage: Stage-5 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3 Vertices: Map 1 Map Operator Tree: TableScan alias: src2 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} {value} 1 {key} {value} keys: 0 key (type: string) 1 key (type: string) Stage: Stage-4 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2 Vertices: Map 3 Map Operator Tree: TableScan alias: src1 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 {key} {value} keys: 0 key (type: string) 1 key (type: string) outputColumnNames: _col0, _col1, _col5, _col6 input vertices: 1 Map 1 Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (_col0 + _col5) is not null (type: boolean) Statistics: Num rows: 8 Data size: 1653 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {_col0} {_col1} {_col5} {_col6} 1 {key} {value} keys: 0 (_col0 + _col5) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:1 Vertices: Map 2 Map Operator Tree: TableScan alias: src3 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col0} {_col1} {_col5} {_col6} 1 {key} {value} keys: 0 (_col0 + _col5) (type: double) 1 UDFToDouble(key) (type: double) outputColumnNames: _col0, _col1, _col5, _col6, _col10, _col11 input vertices: 0 Map 3 Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string), _col6 (type: string), _col10 (type: string), _col11 (type: string) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-7 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3 Vertices: Map 1 Map Operator Tree: TableScan alias: src2 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} {value} 1 {key} {value} keys: 0 key (type: string) 1 key (type: string) Stage: Stage-6 Spark DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2 Vertices: Map 3 Map Operator Tree: TableScan alias: src1 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 {key} {value} keys: 0 key (type: string) 1 key (type: string) outputColumnNames: _col0, _col1, _col5, _col6 input vertices: 1 Map 1 Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: (_col0 + _col5) is not null (type: boolean) Statistics: Num rows: 8 Data size: 1653 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {_col0} {_col1} {_col5} {_col6} 1 {key} {value} keys: 0 (_col0 + _col5) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} Diffs ----- ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java a8b7ac6 Diff: https://reviews.apache.org/r/27955/diff/ Testing ------- Thanks, Chao Sun --===============2044482480282087879==--