hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mustafa Iman (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-14302) Tez: Optimized Hashtable can support DECIMAL keys of same precision
Date Wed, 09 Oct 2019 00:13:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mustafa Iman updated HIVE-14302:
--------------------------------
    Attachment: HIVE-14302.3.patch
        Status: Patch Available  (was: In Progress)

> Tez: Optimized Hashtable can support DECIMAL keys of same precision
> -------------------------------------------------------------------
>
>                 Key: HIVE-14302
>                 URL: https://issues.apache.org/jira/browse/HIVE-14302
>             Project: Hive
>          Issue Type: Improvement
>          Components: Tez
>    Affects Versions: 2.2.0
>            Reporter: Gopal Vijayaraghavan
>            Assignee: Mustafa Iman
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-14302.2.patch, HIVE-14302.3.patch, HIVE-14302.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Decimal support in the optimized hashtable was decided on the basis of the fact that
Decimal(10,1) == Decimal(10, 2) when both contain "1.0" and "1.00".
> However, the joins now don't have any issues with decimal precision because they cast
to common.
> {code}
> create temporary table x (a decimal(10,2), b decimal(10,1)) stored as orc;
> insert into x values (1.0, 1.0);
>     > explain logical select count(1) from x, x x1 where x.a = x1.b;
> OK  
> LOGICAL PLAN:
> $hdt$_0:$hdt$_0:x
>   TableScan (TS_0)
>     alias: x
>     filterExpr: (a is not null and true) (type: boolean)
>     Filter Operator (FIL_18)
>       predicate: (a is not null and true) (type: boolean)
>       Select Operator (SEL_2)
>         expressions: a (type: decimal(10,2))
>         outputColumnNames: _col0
>         Reduce Output Operator (RS_6)
>           key expressions: _col0 (type: decimal(11,2))
>           sort order: +
>           Map-reduce partition columns: _col0 (type: decimal(11,2))
>           Join Operator (JOIN_8)
>             condition map:
>                  Inner Join 0 to 1
>             keys:
>               0 _col0 (type: decimal(11,2))
>               1 _col0 (type: decimal(11,2))
>             Group By Operator (GBY_11)
>               aggregations: count(1)
>               mode: hash
>               outputColumnNames: _col0
> {code}
> See cast up to Decimal(11, 2) in the plan, which normalizes both sides of the join to
be able to compare HiveDecimal as-is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message