hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-18421) Vectorized execution does not handle integer overflows
Date Wed, 10 Jan 2018 19:29:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16320926#comment-16320926
] 

Matt McCline commented on HIVE-18421:
-------------------------------------

Well, I hear you but given how crucial performance it isn't that simple.  Since Java does
not have built in support for detecting underflow/overflow (e.g. $OVERFLOW), you would end
up adding if stmts (a Google search will show you some) with each arithmetic operation that
often destroy the use of the fancy SIMD instructions and good performance.  And, even with
$OVERFLOW, that would probably be the case.

One option might be to generate 2 sets of vectorization classes: checked and unchecked.

Writing the checked alternatives will take some care to make sure they are fast.  And, it
isn't just +/-, but it is also the sum and avg aggregations, etc.

> Vectorized execution does not handle integer overflows
> ------------------------------------------------------
>
>                 Key: HIVE-18421
>                 URL: https://issues.apache.org/jira/browse/HIVE-18421
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 2.1.1, 2.2.0, 3.0.0, 2.3.2
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>
> In vectorized execution arithmetic operations which cause integer overflows can give
wrong results. Issue is reproducible in both Orc and parquet.
> Simple test case to reproduce this issue
> {noformat}
> set hive.vectorized.execution.enabled=true;
> create table parquettable (t1 tinyint, t2 tinyint) stored as parquet;
> insert into parquettable values (-104, 25), (-112, 24), (54, 9);
> select t1, t2, (t1-t2) as diff from parquettable where (t1-t2) < 50 order by diff
desc;
> +-------+-----+-------+
> |  t1   | t2  | diff  |
> +-------+-----+-------+
> | -104  | 25  | 127   |
> | -112  | 24  | 120   |
> | 54    | 9   | 45    |
> +-------+-----+-------+
> {noformat}
> When vectorization is turned off the same query produces only one row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message