Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Mon, 7 Jul 2014 22:05:34 +0000 (UTC)
From: "Ashutosh Chauhan (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12725198.1404374797083.15491.1404770734553@arcas>
In-Reply-To: <JIRA.12725198.1404374797083@arcas>
References: <JIRA.12725198.1404374797083@arcas>
Subject: [jira] [Updated] (HIVE-7346) Wrong results caused by hive ppd under
 specific join condition
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HIVE-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated HIVE-7346:
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.14.0
           Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

> Wrong results caused by hive ppd under specific join condition
> --------------------------------------------------------------
>
>                 Key: HIVE-7346
>                 URL: https://issues.apache.org/jira/browse/HIVE-7346
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0, 0.13.0, 0.13.1
>            Reporter: dima machlin
>            Assignee: Navis
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7346.1.patch.txt
>
>
> Assuming two tables :
> {code:sql} t1(id1 string,id2 string) , t2 (id string,d int) {code}
> t1 contains 1 row : 'a','a'
> t2 contains 1 row : 'a',2
> The following query : 
> {code:sql} select a.*,b.d d1,c.d d2
> from t1 a join t2 b on (a.id1=b.id)
> join t2 c on (a.id2=b.id)
> where b.d <=1 and c.d<=1 {code}
> Returns 0 rows as expected because t2.d = 2
> Wrapping this query, like so : 
> {code:sql} select * from (
> select a.*,b.d d1,c.d d2
> from t1 a join t2 b on (a.id1=b.id)
> join t2 c on (a.id2=b.id)
> where b.d <=1 and c.d<=1
> ) z where d1>1 or d2>1 {code}
> Where another filter was add on the columns causes the plan to lack the filter of the "<=1" and return a single row - *Wrong Results*.
> The plan is : 
> {code:sql}
> ABSTRACT SYNTAX TREE:
>   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1)))))
> STAGE DEPENDENCIES:
>   Stage-7 is a root stage
>   Stage-5 depends on stages: Stage-7
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         z:b 
>           Fetch Operator
>             limit: -1
>         z:c 
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         z:b 
>           TableScan
>             alias: b
>             HashTable Sink Operator
>               condition expressions:
>                 0 {id1} {id2}
>                 1 {id} {d}
>               handleSkewJoin: false
>               keys:
>                 0 [Column[id1]]
>                 1 [Column[id]]
>               Position of Big Table: 0
>         z:c 
>           TableScan
>             alias: c
>             HashTable Sink Operator
>               condition expressions:
>                 0 {_col5} {_col0} {_col1}
>                 1 {d}
>               handleSkewJoin: false
>               keys:
>                 0 []
>                 1 []
>               Position of Big Table: 0
>   Stage: Stage-5
>     Map Reduce
>       Alias -> Map Operator Tree:
>         z:a 
>           TableScan
>             alias: a
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {id1} {id2}
>                 1 {id} {d}
>               handleSkewJoin: false
>               keys:
>                 0 [Column[id1]]
>                 1 [Column[id]]
>               outputColumnNames: _col0, _col1, _col4, _col5
>               Position of Big Table: 0
>               Filter Operator
>                 predicate:
>                     expr: (_col1 = _col4)
>                     type: boolean
>                 Map Join Operator
>                   condition map:
>                        Inner Join 0 to 1
>                   condition expressions:
>                     0 {_col5} {_col0} {_col1}
>                     1 {d}
>                   handleSkewJoin: false
>                   keys:
>                     0 []
>                     1 []
>                   outputColumnNames: _col1, _col4, _col5, _col9
>                   Position of Big Table: 0
>                   Filter Operator
>                     predicate:
>                         expr: ((_col1 > 1) or (_col9 > 1))
>                         type: boolean
>                     Select Operator
>                       expressions:
>                             expr: _col4
>                             type: string
>                             expr: _col5
>                             type: string
>                             expr: _col1
>                             type: int
>                             expr: _col9
>                             type: int
>                       outputColumnNames: _col0, _col1, _col2, _col3
>                       File Output Operator
>                         compressed: false
>                         GlobalTableId: 0
>                         table:
>                             input format: org.apache.hadoop.mapred.TextInputFormat
>                             output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> {code}
> Setting : {code:sql} hive.optimize.ppd=false {code}
> Results in the following *correct* plan : {code:sql} ABSTRACT SYNTAX TREE:
>   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_JOIN (TOK_TABREF (TOK_TABNAME t1) a) (TOK_TABREF (TOK_TABNAME t2) b) (= (. (TOK_TABLE_OR_COL a) id1) (. (TOK_TABLE_OR_COL b) id))) (TOK_TABREF (TOK_TABNAME t2) c) (= (. (TOK_TABLE_OR_COL a) id2) (. (TOK_TABLE_OR_COL b) id)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_ALLCOLREF (TOK_TABNAME a))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) d) d1) (TOK_SELEXPR (. (TOK_TABLE_OR_COL c) d) d2)) (TOK_WHERE (and (<= (. (TOK_TABLE_OR_COL b) d) 1) (<= (. (TOK_TABLE_OR_COL c) d) 1))))) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (or (> (TOK_TABLE_OR_COL d1) 1) (> (TOK_TABLE_OR_COL d2) 1)))))
> STAGE DEPENDENCIES:
>   Stage-7 is a root stage
>   Stage-5 depends on stages: Stage-7
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-7
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         z:b 
>           Fetch Operator
>             limit: -1
>         z:c 
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         z:b 
>           TableScan
>             alias: b
>             HashTable Sink Operator
>               condition expressions:
>                 0 {id1} {id2}
>                 1 {id} {d}
>               handleSkewJoin: false
>               keys:
>                 0 [Column[id1]]
>                 1 [Column[id]]
>               Position of Big Table: 0
>         z:c 
>           TableScan
>             alias: c
>             HashTable Sink Operator
>               condition expressions:
>                 0 {_col5} {_col0} {_col1}
>                 1 {d}
>               handleSkewJoin: false
>               keys:
>                 0 []
>                 1 []
>               Position of Big Table: 0
>   Stage: Stage-5
>     Map Reduce
>       Alias -> Map Operator Tree:
>         z:a 
>           TableScan
>             alias: a
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {id1} {id2}
>                 1 {id} {d}
>               handleSkewJoin: false
>               keys:
>                 0 [Column[id1]]
>                 1 [Column[id]]
>               outputColumnNames: _col0, _col1, _col4, _col5
>               Position of Big Table: 0
>               Filter Operator
>                 predicate:
>                     expr: (_col1 = _col4)
>                     type: boolean
>                 Map Join Operator
>                   condition map:
>                        Inner Join 0 to 1
>                   condition expressions:
>                     0 {_col5} {_col0} {_col1}
>                     1 {d}
>                   handleSkewJoin: false
>                   keys:
>                     0 []
>                     1 []
>                   outputColumnNames: _col1, _col4, _col5, _col9
>                   Position of Big Table: 0
>                   Filter Operator
>                     predicate:
>                         expr: ((_col1 <= 1) and (_col9 <= 1))
>                         type: boolean
>                     Select Operator
>                       expressions:
>                             expr: _col4
>                             type: string
>                             expr: _col5
>                             type: string
>                             expr: _col1
>                             type: int
>                             expr: _col9
>                             type: int
>                       outputColumnNames: _col0, _col1, _col2, _col3
>                       Filter Operator
>                         predicate:
>                             expr: ((_col2 > 1) or (_col3 > 1))
>                             type: boolean
>                         Select Operator
>                           expressions:
>                                 expr: _col0
>                                 type: string
>                                 expr: _col1
>                                 type: string
>                                 expr: _col2
>                                 type: int
>                                 expr: _col3
>                                 type: int
>                           outputColumnNames: _col0, _col1, _col2, _col3
>                           File Output Operator
>                             compressed: false
>                             GlobalTableId: 0
>                             table:
>                                 input format: org.apache.hadoop.mapred.TextInputFormat
>                                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1 {code}


--
This message was sent by Atlassian JIRA
(v6.2#6252)