hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8433) CBO loses a column during AST conversion
Date Tue, 14 Oct 2014 01:48:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170361#comment-14170361
] 

Sergey Shelukhin commented on HIVE-8433:
----------------------------------------

The schema is incorrect. Looks like the schema check is messed up.
With order by and adding extra column (e.g. key) to select list of the query, fixTopOB detects
schema mismatch and CBO fails.
Without order by, schema mismatch check is never even performed and the query, as described,
coincidentally happens to produce a correct plan and result.
However, if extra column (that causes a mismatch) is also the only order by column, fixTopOB
removes it from projection (assuming that it is there for order by, I guess?), and the number
of columns in projection and (incorrect) schema just happens to match, so incorrect result
is produced.

> CBO loses a column during AST conversion
> ----------------------------------------
>
>                 Key: HIVE-8433
>                 URL: https://issues.apache.org/jira/browse/HIVE-8433
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>
> {noformat}
> SELECT
>   CAST(value AS BINARY),
>   value
> FROM src
> ORDER BY value
> LIMIT 100
> {noformat}
> returns only one column.
> Final CBO plan is
> {noformat}
>   HiveSortRel(sort0=[$1], dir0=[ASC]): rowcount = 500.0, cumulative cost = {24858.432393688767
rows, 500.0 cpu, 0.0 io}, id = 44
>     HiveProjectRel(value=[CAST($0):BINARY(2147483647) NOT NULL], value1=[$0]): rowcount
= 500.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 42
>       HiveProjectRel(value=[$1]): rowcount = 500.0, cumulative cost = {0.0 rows, 0.0
cpu, 0.0 io}, id = 40
>         HiveTableScanRel(table=[[default.src]]): rowcount = 500.0, cumulative cost =
{0}, id = 0
> {noformat}
> but the resulting AST has only one column. Must be some bug in conversion, probably related
to the name collision in the schema, judging by the alias of the column for the binary-cast
value in the AST
> {noformat} 
> TOK_QUERY
>    TOK_FROM
>       TOK_SUBQUERY
>          TOK_QUERY
>             TOK_FROM
>                TOK_TABREF
>                   TOK_TABNAME
>                      default
>                      src
>                   src
>             TOK_INSERT
>                TOK_DESTINATION
>                   TOK_DIR
>                      TOK_TMP_FILE
>                TOK_SELECT
>                   TOK_SELEXPR
>                      .
>                         TOK_TABLE_OR_COL
>                            src
>                         value
>                      value
>          $hdt$_0
>    TOK_INSERT
>       TOK_DESTINATION
>          TOK_DIR
>             TOK_TMP_FILE
>       TOK_SELECT
>          TOK_SELEXPR
>             TOK_FUNCTION
>                TOK_BINARY
>                .
>                   TOK_TABLE_OR_COL
>                      $hdt$_0
>                   value
>             value
>       TOK_ORDERBY
>          TOK_TABSORTCOLNAMEASC
>             TOK_TABLE_OR_COL
>                value
>       TOK_LIMIT
>          100
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message