hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-17186) `double` type constant operation loses precision
Date Thu, 27 Jul 2017 17:17:00 GMT
Dongjoon Hyun created HIVE-17186:
------------------------------------

             Summary: `double` type constant operation loses precision
                 Key: HIVE-17186
                 URL: https://issues.apache.org/jira/browse/HIVE-17186
             Project: Hive
          Issue Type: Bug
            Reporter: Dongjoon Hyun


This might be an issue where Hive loses a precision and generates a wrong result when handling
*double* constant operations. This was reported in the following environment.

*ENVIRONMENT*
https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql

*SQL*
{code}
hive> explain select l_discount from lineitem where l_discount between 0.06 - 0.01 and
0.06 + 0.01 limit 10;
OK
Plan not optimized by CBO.

Stage-0
   Fetch Operator
      limit:10
      Stage-1
         Map 1 vectorized
         File Output Operator [FS_9]
            compressed:false
            Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
            table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
            Limit [LIM_8]
               Number of rows:10
               Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
               Select Operator [OP_7]
                  outputColumnNames:["_col0"]
                  Statistics:Num rows: 2999994854 Data size: 23999958832 Basic stats: COMPLETE
Column stats: COMPLETE
                  Filter Operator [FIL_6]
                     predicate:l_discount BETWEEN 0.049999999999999996 AND 0.06999999999999999
(type: boolean)
                     Statistics:Num rows: 2999994854 Data size: 23999958832 Basic stats: COMPLETE
Column stats: COMPLETE
                     TableScan [TS_0]
                        alias:lineitem
                        Statistics:Num rows: 5999989709 Data size: 4832986297043 Basic stats:
COMPLETE Column stats: COMPLETE

hive> select max(l_discount) from lineitem where l_discount between 0.06 - 0.01 and 0.06
+ 0.01 limit 10;
OK
0.06
Time taken: 314.923 seconds, Fetched: 1 row(s)
{code}

Hive excludes 0.07 differently from the users' intuitiion. Also, this difference makes some
users confused because they believe that Hive's result is the correct one. Is there any way
for Hive to fix this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message