hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Hanson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
Date Wed, 11 Dec 2013 19:01:07 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845634#comment-13845634
] 

Eric Hanson commented on HIVE-5996:
-----------------------------------

-1

We should not be changing the output data types of expression results that are arguably reasonable.
It causes code churn and can break existing apps. 

Having sum(bigint) return bigint is long standing behavior in Hive and is reasonable.

As a side note, SQL Server returns bigint for sum(bigint).

If users need more digits, they can cast the input to sum to a decimal.

> Query for sum of a long column of a table with only two rows produces wrong result
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-5996
>                 URL: https://issues.apache.org/jira/browse/HIVE-5996
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.12.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-5996.patch
>
>
> {code}
> hive> desc test2;
> OK
> l                   	bigint              	None                
> hive> select * from test2;                                 
> OK
> 6666666666666666666
> 5555555555555555555
> hive> select sum(l) from test2;
> OK
> -6224521851487329395
> {code}
> It's believed that a wrap-around error occurred. It's surprising that it happens only
with two rows. Same query in MySQL returns:
> {code}
> mysql> select sum(l) from test;
> +----------------------+
> | sum(l)               |
> +----------------------+
> | 12222222222222222221 |
> +----------------------+
> 1 row in set (0.00 sec)
> {code}
> Hive should accommodate large number of rows. Overflowing with only two rows is very
unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message