hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17186) `double` type constant operation loses precision
Date Thu, 27 Jul 2017 17:38:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103559#comment-16103559
] 

Gopal V commented on HIVE-17186:
--------------------------------

bq. Is there any way for Hive to fix this?

No, this is {{0.1+0.2 != 0.3}} problem with IEEE 754 arithmetic.

Decimal and 0.1BD + 0.2BD wouldn't cause these rounding errors.

> `double` type constant operation loses precision
> ------------------------------------------------
>
>                 Key: HIVE-17186
>                 URL: https://issues.apache.org/jira/browse/HIVE-17186
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Dongjoon Hyun
>
> This might be an issue where Hive loses a precision and generates a wrong result when
handling *double* constant operations. This was reported in the following environment.
> *ENVIRONMENT*
> https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql
> *SQL*
> {code}
> hive> explain select l_discount from lineitem where l_discount between 0.06 - 0.01
and 0.06 + 0.01 limit 10;
> OK
> Plan not optimized by CBO.
> Stage-0
>    Fetch Operator
>       limit:10
>       Stage-1
>          Map 1 vectorized
>          File Output Operator [FS_9]
>             compressed:false
>             Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats:
COMPLETE
>             table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output
format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
>             Limit [LIM_8]
>                Number of rows:10
>                Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats:
COMPLETE
>                Select Operator [OP_7]
>                   outputColumnNames:["_col0"]
>                   Statistics:Num rows: 2999994854 Data size: 23999958832 Basic stats:
COMPLETE Column stats: COMPLETE
>                   Filter Operator [FIL_6]
>                      predicate:l_discount BETWEEN 0.049999999999999996 AND 0.06999999999999999
(type: boolean)
>                      Statistics:Num rows: 2999994854 Data size: 23999958832 Basic stats:
COMPLETE Column stats: COMPLETE
>                      TableScan [TS_0]
>                         alias:lineitem
>                         Statistics:Num rows: 5999989709 Data size: 4832986297043 Basic
stats: COMPLETE Column stats: COMPLETE
> hive> select max(l_discount) from lineitem where l_discount between 0.06 - 0.01 and
0.06 + 0.01 limit 10;
> OK
> 0.06
> Time taken: 314.923 seconds, Fetched: 1 row(s)
> {code}
> Hive excludes 0.07 differently from the users' intuitiion. Also, this difference makes
some users confused because they believe that Hive's result is the correct one. Is there any
way for Hive to fix this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message