hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <>
Subject [jira] [Commented] (HIVE-8111) CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO
Date Tue, 16 Sep 2014 20:48:34 GMT


Sergey Shelukhin commented on HIVE-8111:

[~ashutoshc] [~jpullokkaran] fyi. I've tried doing 1 and 2 and encountered problems, for now
exploring 5 and 3... tell me if you have any input.

Example of the biggest problem where decimal becomes null due to incorrect type is:
SELECT key * value FROM DECIMAL_UDF, "expressions: (key * value) (type: decimal(31,10))" becomes
"expressions: (key * CAST( value AS decimal(31,10))) (type: decimal(38,20))" and 1524157875171467887.5019052100
becomes NULL because there are more than 18 digits in decimal part.
Incorrect types can also result in different types which I assume can make insert/create queries
have undesirable results; not sure about other possible effects.

> CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO
> ----------------------------------------------------------------------------
>                 Key: HIVE-8111
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: CBO
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
> Original test failure: looks like column type changes to different decimals in most cases.
In one case it causes the integer part to be too big to fit, so the result becomes null it
> What happens is that CBO adds casts to arithmetic expressions to make them type compatible;
these casts become part of new AST, and then Hive adds casts on top of these casts. This (the
first part) also causes lots of out file changes. It's not clear how to best fix it so far,
in addition to incorrect decimal width and sometimes nulls when width is larger than allowed
in Hive.
> Option one - don't add those for numeric ops - cannot be done if numeric op is a part
of compare, for which CBO needs correct types.
> Option two - unwrap casts when determining type in Hive - hard or impossible to tell
apart CBO-added casts and user casts. 
> Option three - don't change types in Hive if CBO has run - seems hacky and hard to ensure
it's applied everywhere.
> Option four - map all expressions precisely between two trees and remove casts again
after optimization, will be pretty difficult.
> Option five - somehow mark those casts. Not sure about how yet.

This message was sent by Atlassian JIRA

View raw message