hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10874) Fail in TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2.q due to duplicate column name
Date Mon, 01 Jun 2015 23:31:17 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568220#comment-14568220
] 

Jesus Camacho Rodriguez commented on HIVE-10874:
------------------------------------------------

[~jpullokkaran], this problem is not only in Hive, the patch should go into Calcite too, and
once the next release is out, we could remove it from here.

In this case, the condition is risen because we have the following plan:
{noformat}
Aggregate (f_1, sum(f_1)) 
  Union
    Aggregate (x, sum(x)) ...
    Aggregate (x, sum(x))  ...
{noformat}
where f1 is the column with the result of sum(x).

The problem is that Calcite derives the row schema for the aggregation column sum(f1) automatically.
The generated name is f_1 ('f' of function, 1 of the position in the tuple), which is the
same one that the first column has; however, Calcite was not verifying if the autogenerated
name was already in the tuple or not. This patch checks if the name already exists, and while
it does, it generates a new column name.



> Fail in TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2.q due to duplicate
column name
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-10874
>                 URL: https://issues.apache.org/jira/browse/HIVE-10874
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-10874.patch
>
>
> Aggregate operators may derive row types with duplicate column names. The reason is that
the column names for grouping sets columns and aggregation columns might be generated automatically,
but we do not check whether the column name already exists in the same row.
> This error can be reproduced by TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2.q,
which fails with the following trace:
> {code}
> junit.framework.AssertionFailedError: Unexpected exception java.lang.AssertionError:
RecordType(BIGINT $f1, BIGINT $f1)
> 	at org.apache.calcite.rel.core.Project.isValid(Project.java:200)
> 	at org.apache.calcite.rel.core.Project.<init>(Project.java:85)
> 	at org.apache.calcite.rel.core.Project.<init>(Project.java:91)
> 	at org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.<init>(HiveProject.java:70)
> 	at org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.create(HiveProject.java:103)
> 	at org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.introduceDerivedTable(PlanModifierForASTConv.java:211)
> 	at org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:67)
> 	at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:94)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:617)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:248)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10108)
> 	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:208)
> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
> 	at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message