hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Martin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9385) Sum a Double using an ORC table
Date Fri, 16 Jan 2015 03:36:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279769#comment-14279769
] 

Nick Martin commented on HIVE-9385:
-----------------------------------

[~damien.carol] So I have ~150m rows of sales data in an ORC table and there's a column for
the sales amount I'm storing as a double. When I sum on that column I get the value I reported
above (4.7...). The true sum of that field is ~$2.5b or so.

When I do the exact same thing (create the same table, store the sales column as a double,
sum on that column) but store the table as textfile I get the correct amount. 

So, I'm saying I think there's something going on with sum() on doubles in ORC tables and
am hoping someone could give it a shot in their environment and let me know if it appears
to be a bug or not.

> Sum a Double using an ORC table
> -------------------------------
>
>                 Key: HIVE-9385
>                 URL: https://issues.apache.org/jira/browse/HIVE-9385
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.1
>         Environment: HDP 2.x, Hive
>            Reporter: Nick Martin
>            Priority: Minor
>
> I’m storing a sales amount column as a double in an ORC table and when I do:
> {code:sql}
> select sum(x) from sometable
> {code}
> I get a value like {{4.79165141174808E9}}
> A visual inspection of the column values reveals no glaring anomalies…all looks pretty
normal. 
> If I do the same thing in a textfile table I get a perfectly fine aggregation of the
double field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message