hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dain Sundstrom (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
Date Wed, 05 Nov 2014 23:25:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199353#comment-14199353
] 

Dain Sundstrom commented on HIVE-8732:
--------------------------------------

DoubleStatisticsImpl merge and update methods don't handle NaN properly.  Any comparison with
NaN returns false, so if the first value is NaN you end up with min and max of NaN, which
implies that the column only contains NaNs.  We should consider tracking NaN specially in
the stats.

Regardless, for now any code reading the DoubleStatistic should discard a stat containing
a NaN.

> ORC string statistics are not merged correctly
> ----------------------------------------------
>
>                 Key: HIVE-8732
>                 URL: https://issues.apache.org/jira/browse/HIVE-8732
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: HIVE-8732.patch
>
>
> Currently ORC's string statistics do not merge correctly causing incorrect maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message