hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15327) Outerjoin might produce wrong result depending on joinEmitInterval value
Date Thu, 01 Dec 2016 18:35:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712692#comment-15712692
] 

Ashutosh Chauhan commented on HIVE-15327:
-----------------------------------------

* We need to check both filterMaps and joinCondtionDesc, no? In the patch you are no longer
checking for filterMaps.
* It will be good to have few rows in test table with join column value as NULL. 
* You can insert multiple rows in single statement.. e.g., INSERT INTO tbl1 VALUES ('Id_1',
'val_101'), ('Id_2', 'val_102'), ('Id_3', 'val_103');

> Outerjoin might produce wrong result depending on joinEmitInterval value
> ------------------------------------------------------------------------
>
>                 Key: HIVE-15327
>                 URL: https://issues.apache.org/jira/browse/HIVE-15327
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 1.3.0, 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>         Attachments: HIVE-15327.01.patch, HIVE-15327.patch
>
>
> If joinEmitInterval is smaller than the group size, outerjoins might produce records
with NULL appended values multiple times (once per group).
> HIVE-4689 targeted the same problem. However, the fix does not seem to cover all cases
(in particular, it will not apply to left outer joins with filter conditions on the left input).
The solution in HIVE-4689 was to disable (override) joinEmitInterval value for those cases.
This fix follows the same approach.
> To reproduce the problem:
> {code}
> set hive.strict.checks.cartesian.product=false;
> set hive.join.emit.interval=1;
> CREATE TABLE test1 (key INT, value INT, col_1 STRING);
> INSERT INTO test1 VALUES (99, 0, 'Alice');
> INSERT INTO test1 VALUES (99, 2, 'Mat');
> INSERT INTO test1 VALUES (100, 1, 'Bob');
> INSERT INTO test1 VALUES (101, 2, 'Car');
> CREATE TABLE test2 (key INT, value INT, col_2 STRING);
> INSERT INTO test2 VALUES (102, 2, 'Del');
> INSERT INTO test2 VALUES (103, 2, 'Ema');
> INSERT INTO test2 VALUES (104, 3, 'Fli');
> -- Equi-condition and condition on one input (left outer join)
> SELECT *
> FROM test1 LEFT OUTER JOIN test2
> ON (test1.value=test2.value AND test1.key between 100 and 102)
> LIMIT 10;
> -- Condition on one input (left outer join)
> SELECT *
> FROM test1 LEFT OUTER JOIN test2
> ON (test1.key between 100 and 102)
> LIMIT 10;
> {code}
> For the *first* query, current (incorrect) result is:
> {noformat}
>  99     0       Alice   NULL    NULL    NULL
>  100    1       Bob     NULL    NULL    NULL
>  101    2       Car     103     2       Ema
>  99     2       Mat     NULL    NULL    NULL
>  101    2       Car     102     2       Del
>  99     2       Mat     NULL    NULL    NULL
> {noformat}
> Expected (correct) result is:
> {noformat}
>  99     0       Alice   NULL    NULL    NULL
>  100    1       Bob     NULL    NULL    NULL
>  101    2       Car     103     2       Ema
>  101    2       Car     102     2       Del
>  99     2       Mat     NULL    NULL    NULL
> {noformat}
> For the *second* query, current (incorrect) result is:
> {noformat}
>  101    2       Car     104     3       Fli
>  100    1       Bob     104     3       Fli
>  99     2       Mat     NULL    NULL    NULL
>  99     0       Alice   NULL    NULL    NULL
>  101    2       Car     103     2       Ema
>  100    1       Bob     103     2       Ema
>  99     2       Mat     NULL    NULL    NULL
>  99     0       Alice   NULL    NULL    NULL
>  101    2       Car     102     2       Del
>  100    1       Bob     102     2       Del
> {noformat}
> Expected (correct) result is:
> {noformat}
>  101    2       Car     104     3       Fli
>  101    2       Car     103     2       Ema
>  101    2       Car     102     2       Del
>  100    1       Bob     104     3       Fli
>  100    1       Bob     103     2       Ema
>  100    1       Bob     102     2       Del
>  99     2       Mat     NULL    NULL    NULL
>  99     0       Alice   NULL    NULL    NULL
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message