hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13767) Wrong type inferred in Semijoin condition leads to AssertionError
Date Mon, 16 May 2016 18:25:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285000#comment-15285000
] 

Jesus Camacho Rodriguez commented on HIVE-13767:
------------------------------------------------

The method will push any equi join conditions that are not column references as Projections
on top of the children of the join, and will return the new condition.

The problem is that the position of the columns is accounted incorrectly, and thus we end
up inferring incorrect type for the column.

Let me go into detail in the code.

Observe the loop in L197-L211. It extracts which conditions are column references, and which
ones need to pushed to the inputs of the join. Assume we have a condition {{CAST(a)=b AND
c=d}}. First conjunct is added to the columns that need to be added to the inputs (because
of CAST), while second conjunct is added to the columns that do not need to be pushed i.e.
they can be directly referenced.

Then loop in L213-L229 creates the first part of the condition consisting of the equality
conditions that do not need to be pushed i.e. {{c=d}} in our example. But observe that _leftKey_
and _rightKey_, which are used to inferred the type, are extracted from _leftJoinKeys_ and
_rightJoinKeys_ respectively, using index _i_ from origColEqConds... This is not right, as
_leftKey_ will reference {{CAST(a)}} and _rightKey_ will reference {{b}}. That is to say,
the condition that do not need to be pushed is at _i=1_.

Thus, we need the keep the positions of _leftJoinKeys_ and _rightJoinKeys_ that contain conditions
that do not need to be pushed: we keep this information in the new _origColEqCondsPos_ bitset.

> Wrong type inferred in Semijoin condition leads to AssertionError
> -----------------------------------------------------------------
>
>                 Key: HIVE-13767
>                 URL: https://issues.apache.org/jira/browse/HIVE-13767
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 2.1.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-13767.patch
>
>
> Following query fails to run:
> {noformat}
> SELECT
>     COALESCE(498, LEAD(COALESCE(-973, -684, 515)) OVER (PARTITION BY (t2.int_col_10 +
t1.smallint_col_50) ORDER BY (t2.int_col_10 + t1.smallint_col_50), FLOOR(t1.double_col_16)
DESC), 524) AS int_col,
>     (t2.int_col_10) + (t1.smallint_col_50) AS int_col_1,
>     FLOOR(t1.double_col_16) AS float_col,
>     COALESCE(SUM(COALESCE(62, -380, -435)) OVER (PARTITION BY (t2.int_col_10 + t1.smallint_col_50)
ORDER BY (t2.int_col_10 + t1.smallint_col_50) DESC, FLOOR(t1.double_col_16) DESC ROWS BETWEEN
UNBOUNDED PRECEDING AND 48 FOLLOWING), 704) AS int_col_2
> FROM table_1 t1
> INNER JOIN table_18 t2 ON (((t2.tinyint_col_15) = (t1.bigint_col_7)) AND
>                            ((t2.decimal2709_col_9) = (t1.decimal2016_col_26))) AND
>                            ((t2.tinyint_col_20) = (t1.tinyint_col_3))
> WHERE (t2.smallint_col_19) IN (SELECT
>     COALESCE(-92, -994) AS int_col
>     FROM table_1 tt1
>     INNER JOIN table_18 tt2 ON (tt2.decimal1911_col_16) = (tt1.decimal2612_col_77)
>     WHERE (t1.timestamp_col_9) = (tt2.timestamp_col_18));
> {noformat}
> Following error is seen in the logs:
> {noformat}
> 2016-04-27T04:32:09,605 WARN  [...2a24 HiveServer2-Handler-Pool: Thread-211]: thrift.ThriftCLIService
(ThriftCLIService.java:ExecuteStatement(501)) - Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.AssertionError:
mismatched type $8 TIMESTAMP(9)
>         at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:178)
~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:216)
~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hive.service.cli.operation.Operation.run(Operation.java:327) ~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:458)
~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:435)
~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:272)
~[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:492)
[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317)
[hive-service-rpc-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302)
[hive-service-rpc-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
[hive-service-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[?:1.8.0_77]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[?:1.8.0_77]
>         at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]
> Caused by: java.lang.AssertionError: mismatched type $8 TIMESTAMP(9)
>         at org.apache.calcite.rex.RexUtil$FixNullabilityShuttle.visitInputRef(RexUtil.java:2042)
~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexUtil$FixNullabilityShuttle.visitInputRef(RexUtil.java:2020)
~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.visitList(RexShuttle.java:144) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.visitCall(RexShuttle.java:93) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.visitCall(RexShuttle.java:36) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexCall.accept(RexCall.java:108) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.apply(RexShuttle.java:275) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.mutate(RexShuttle.java:234) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexShuttle.apply(RexShuttle.java:252) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rex.RexUtil.fixUp(RexUtil.java:1239) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.rel.rules.FilterJoinRule.perform(FilterJoinRule.java:232)
~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterJoinRule$HiveFilterJoinMergeRule.onMatch(HiveFilterJoinRule.java:78)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:318)
~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterJoinRule$HiveFilterJoinMergeRule.onMatch(HiveFilterJoinRule.java:78)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:318)
~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:514) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:392) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:285)
~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:72)
~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:207)
~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:194) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.hepPlan(CalcitePlanner.java:1293)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.applyPreJoinOrderingTransforms(CalcitePlanner.java:1166)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:956)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:887)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:969)
~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106) ~[calcite-core-1.6.0.2.5.0.0-248.jar:1.6.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:706)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:274)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10642)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:233)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:476) ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:318) ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1191) ~[hive-exec-2.1.0.2.5.0.0-248.jar:2.1.0.2.5.0.0-248]
>         ... 15 more
> {noformat}
> Hive DDL for supporting tables are:
> {noformat}
> CREATE TABLE table_1 (timestamp_col_1 TIMESTAMP, decimal3003_col_2 DECIMAL(30, 3), tinyint_col_3
TINYINT, decimal0101_col_4 DECIMAL(1, 1), boolean_col_5 BOOLEAN, float_col_6 FLOAT, bigint_col_7
BIGINT, varchar0098_col_8 VARCHAR(98), timestamp_col_9 TIMESTAMP, bigint_col_10 BIGINT, decimal0903_col_11
DECIMAL(9, 3), timestamp_col_12 TIMESTAMP, timestamp_col_13 TIMESTAMP, float_col_14 FLOAT,
char0254_col_15 CHAR(254), double_col_16 DOUBLE, timestamp_col_17 TIMESTAMP, boolean_col_18
BOOLEAN, decimal2608_col_19 DECIMAL(26, 8), varchar0216_col_20 VARCHAR(216), string_col_21
STRING, bigint_col_22 BIGINT, boolean_col_23 BOOLEAN, timestamp_col_24 TIMESTAMP, boolean_col_25
BOOLEAN, decimal2016_col_26 DECIMAL(20, 16), string_col_27 STRING, decimal0202_col_28 DECIMAL(2,
2), float_col_29 FLOAT, decimal2020_col_30 DECIMAL(20, 20), boolean_col_31 BOOLEAN, double_col_32
DOUBLE, varchar0148_col_33 VARCHAR(148), decimal2121_col_34 DECIMAL(21, 21), tinyint_col_35
TINYINT, boolean_col_36 BOOLEAN, boolean_col_37 BOOLEAN, string_col_38 STRING, decimal3420_col_39
DECIMAL(34, 20), timestamp_col_40 TIMESTAMP, decimal1408_col_41 DECIMAL(14, 8), string_col_42
STRING, decimal0902_col_43 DECIMAL(9, 2), varchar0204_col_44 VARCHAR(204), boolean_col_45
BOOLEAN, timestamp_col_46 TIMESTAMP, boolean_col_47 BOOLEAN, bigint_col_48 BIGINT, boolean_col_49
BOOLEAN, smallint_col_50 SMALLINT, decimal0704_col_51 DECIMAL(7, 4), timestamp_col_52 TIMESTAMP,
boolean_col_53 BOOLEAN, timestamp_col_54 TIMESTAMP, int_col_55 INT, decimal0505_col_56 DECIMAL(5,
5), char0155_col_57 CHAR(155), boolean_col_58 BOOLEAN, bigint_col_59 BIGINT, boolean_col_60
BOOLEAN, boolean_col_61 BOOLEAN, char0249_col_62 CHAR(249), boolean_col_63 BOOLEAN, timestamp_col_64
TIMESTAMP, decimal1309_col_65 DECIMAL(13, 9), int_col_66 INT, float_col_67 FLOAT, timestamp_col_68
TIMESTAMP, timestamp_col_69 TIMESTAMP, boolean_col_70 BOOLEAN, timestamp_col_71 TIMESTAMP,
double_col_72 DOUBLE, boolean_col_73 BOOLEAN, char0222_col_74 CHAR(222), float_col_75 FLOAT,
string_col_76 STRING, decimal2612_col_77 DECIMAL(26, 12), timestamp_col_78 TIMESTAMP, char0128_col_79
CHAR(128), timestamp_col_80 TIMESTAMP, double_col_81 DOUBLE, timestamp_col_82 TIMESTAMP, float_col_83
FLOAT, decimal2622_col_84 DECIMAL(26, 22), double_col_85 DOUBLE, float_col_86 FLOAT, decimal0907_col_87
DECIMAL(9, 7)) STORED AS orc;
> CREATE TABLE table_18 (boolean_col_1 BOOLEAN, boolean_col_2 BOOLEAN, decimal2518_col_3
DECIMAL(25, 18), float_col_4 FLOAT, timestamp_col_5 TIMESTAMP, double_col_6 DOUBLE, double_col_7
DOUBLE, char0035_col_8 CHAR(35), decimal2709_col_9 DECIMAL(27, 9), int_col_10 INT, timestamp_col_11
TIMESTAMP, decimal3604_col_12 DECIMAL(36, 4), string_col_13 STRING, int_col_14 INT, tinyint_col_15
TINYINT, decimal1911_col_16 DECIMAL(19, 11), float_col_17 FLOAT, timestamp_col_18 TIMESTAMP,
smallint_col_19 SMALLINT, tinyint_col_20 TINYINT, timestamp_col_21 TIMESTAMP, boolean_col_22
BOOLEAN, int_col_23 INT) STORED AS orc;
> {noformat}
> The problem is that the reference indices in the condition (and thus, their type) are
inferred incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message