Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C51A18D66 for ; Tue, 24 Nov 2015 15:47:11 +0000 (UTC) Received: (qmail 83772 invoked by uid 500); 24 Nov 2015 15:47:11 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 83743 invoked by uid 500); 24 Nov 2015 15:47:11 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 83714 invoked by uid 99); 24 Nov 2015 15:47:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2015 15:47:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id F168E2C044E for ; Tue, 24 Nov 2015 15:47:10 +0000 (UTC) Date: Tue, 24 Nov 2015 15:47:10 +0000 (UTC) From: "Jesus Camacho Rodriguez (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-12465) Hive might produce wrong results when (outer) joins are merged MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12465: ------------------------------------------- Attachment: (was: HIVE-12465.patch) > Hive might produce wrong results when (outer) joins are merged > -------------------------------------------------------------- > > Key: HIVE-12465 > URL: https://issues.apache.org/jira/browse/HIVE-12465 > Project: Hive > Issue Type: Bug > Affects Versions: 1.3.0, 2.0.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Blocker > Attachments: HIVE-12465.01.patch, HIVE-12465.02.patch, HIVE-12465.patch > > > Consider the following query: > {noformat} > select * from > (select * from tab where tab.key = 0)a > full outer join > (select * from tab_part where tab_part.key = 98)b > join > tab_part c > on a.key = b.key and b.key = c.key; > {noformat} > Hive should execute the full outer join operation (without ON clause) and then the join operation (ON a.key = b.key and b.key = c.key). Instead, it merges both joins, generating the following plan: > {noformat} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: tab > filterExpr: (key = 0) (type: boolean) > Statistics: Num rows: 242 Data size: 22748 Basic stats: COMPLETE Column stats: NONE > Filter Operator > predicate: (key = 0) (type: boolean) > Statistics: Num rows: 121 Data size: 11374 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: 0 (type: int), value (type: string), ds (type: string) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 121 Data size: 11374 Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 121 Data size: 11374 Basic stats: COMPLETE Column stats: NONE > value expressions: _col1 (type: string), _col2 (type: string) > TableScan > alias: tab_part > filterExpr: (key = 98) (type: boolean) > Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE > Filter Operator > predicate: (key = 98) (type: boolean) > Statistics: Num rows: 250 Data size: 23500 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: 98 (type: int), value (type: string), ds (type: string) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 250 Data size: 23500 Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 250 Data size: 23500 Basic stats: COMPLETE Column stats: NONE > value expressions: _col1 (type: string), _col2 (type: string) > TableScan > alias: c > Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: key (type: int) > sort order: + > Map-reduce partition columns: key (type: int) > Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE Column stats: NONE > value expressions: value (type: string), ds (type: string) > Reduce Operator Tree: > Join Operator > condition map: > Outer Join 0 to 1 > Inner Join 1 to 2 > keys: > 0 _col0 (type: int) > 1 _col0 (type: int) > 2 key (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8 > Statistics: Num rows: 1100 Data size: 103400 Basic stats: COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1100 Data size: 103400 Basic stats: COMPLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {noformat} > That plan is equivalent to the following query, which is different than the original one: > {noformat} > select * from > (select * from tab where tab.key = 0)a > full outer join > (select * from tab_part where tab_part.key = 98)b > on a.key = b.key > join > tab_part c > on b.key = c.key; > {noformat} > It seems to be a problem in the recognition of join operations that can be merged into a single multijoin operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)