hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8225) CBO trunk merge: union11 test fails due to incorrect plan
Date Mon, 22 Sep 2014 23:29:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144025#comment-14144025
] 

Sergey Shelukhin commented on HIVE-8225:
----------------------------------------

Minimum query to reproduce the issue:
{noformat}
select unionsrc.key FROM (select 'tst1' as key, count(1) as value from src s1) unionsrc;
{noformat} - returns 500 rows of tst1 whereas it should return just 1.
If you add value to select list -
{noformat}
select unionsrc.key,unionsrc.value FROM (select 'tst1' as key, count(1) as value from src
s1) unionsrc;
{noformat} the problem disappears.

ASTs for both queries, both before and after CBO differ only in addition of the last select
expression; however, as is obvious from the below, in CBO the absence of said expression causes
the count to be completely gone.
Initial AST for the 2nd query (correct result):
{noformat}
TOK_QUERY
   TOK_FROM
      TOK_SUBQUERY
         TOK_QUERY
            TOK_FROM
               TOK_TABREF
                  TOK_TABNAME
                     src
                  s1
            TOK_INSERT
               TOK_DESTINATION
                  TOK_DIR
                     TOK_TMP_FILE
               TOK_SELECT
                  TOK_SELEXPR
                     'tst1'
                     key
                  TOK_SELEXPR
                     TOK_FUNCTION
                        count
                        1
                     value
         unionsrc
   TOK_INSERT
      TOK_DESTINATION
         TOK_DIR
            TOK_TMP_FILE
      TOK_SELECT
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  unionsrc
               key
         TOK_SELEXPR
            .
               TOK_TABLE_OR_COL
                  unionsrc
               value
{noformat}

Post-CBO
{noformat}
TOK_QUERY
   TOK_FROM
      TOK_SUBQUERY
         TOK_QUERY
            TOK_FROM
               TOK_SUBQUERY
                  TOK_QUERY
                     TOK_FROM
                        TOK_TABREF
                           TOK_TABNAME
                              default
                              src
                           s1
                     TOK_INSERT
                        TOK_DESTINATION
                           TOK_DIR
                              TOK_TMP_FILE
                        TOK_SELECT
                           TOK_SELEXPR
                              0
                              DUMMY
                  $hdt$_2
            TOK_INSERT
               TOK_DESTINATION
                  TOK_DIR
                     TOK_TMP_FILE
               TOK_SELECT
                  TOK_SELEXPR
                     1
                     $f0
         $hdt$_3
   TOK_INSERT
      TOK_DESTINATION
         TOK_DIR
            TOK_TMP_FILE
      TOK_SELECT
         TOK_SELEXPR
            'tst1'
            unionsrc.key
         TOK_SELEXPR
            TOK_FUNCTION
               count
               .
                  TOK_TABLE_OR_COL
                     $hdt$_3
                  $f0
            unionsrc.value
{noformat}
Note where count is... in key-only query, where this SELEXPER is gone, there's no count, so
result changes.

> CBO trunk merge: union11 test fails due to incorrect plan
> ---------------------------------------------------------
>
>                 Key: HIVE-8225
>                 URL: https://issues.apache.org/jira/browse/HIVE-8225
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> The result changes to as if the union didn't have count() inside. The issue can be fixed
by using srcunion.value outside the subquery in count (replace count(1) with count(srcunion.value)).
Otherwise, it looks like count(1) node from union-ed queries is not present in AST at all,
which might cause this result.
> Interestingly, adding group by to each query in a union produces completely weird result
(count(1) is 309 for each key, whereas it should be 1 and the "logical" incorrect value if
internal count is lost is 500)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message