hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johnny Zhang" <xiao...@cloudera.com>
Subject Re: Review Request: float and double calculation is inaccurate in Hive
Date Tue, 18 Dec 2012 01:00:40 GMT


> On Dec. 18, 2012, 12:38 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java,
line 50
> > <https://reviews.apache.org/r/8653/diff/1/?file=240423#file240423line50>
> >
> >     10 seems to be a rather arbitrary number for scale. Any particular reason you
are using it? Maybe we should invoke the method where no scale needs to be specified.

Hi, Mark, thanks for reviewing it. The reason using 10 is because it is the same as mysql
default precision setting. Just want to make the calculation result identical to mysql's


> On Dec. 18, 2012, 12:38 a.m., Mark Grover wrote:
> > http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java,
line 112
> > <https://reviews.apache.org/r/8653/diff/1/?file=240424#file240424line112>
> >
> >     You seem to be doing
> >     DoubleWritable->String->BigDecimal
> >     
> >     There probably is a way to do:
> >     DoubleWritable->Double->BigDecimal
> >     
> >     I am not sure if it's any more efficient the present case. So, take this suggestion
with a grain of salt:-)
> >

the reason using constructor with String parameter is because using constructor with double
parameter would reduce the precision before calculation. There is a similar discussion regarding
it http://www.coderanch.com/t/408226/java/java/Double-BigDecimal-Conversion-problems

"you will see the difference between creating an instance using a double (whose precision
has already been compromised by forcing it into IEEE 754 standards) and creating an instance
using a String (which can be translated accurately). "


- Johnny


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8653/#review14625
-----------------------------------------------------------


On Dec. 18, 2012, 12:37 a.m., Johnny Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8653/
> -----------------------------------------------------------
> 
> (Updated Dec. 18, 2012, 12:37 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Description
> -------
> 
> I found this during debug the e2e test failures. I found Hive miss calculate the float
and double value. Take float calculation as an example:
> hive> select f from all100k limit 1;
> 48308.98
> hive> select f/10 from all100k limit 1;
> 4830.898046875 <--added 04875 in the end
> hive> select f*1.01 from all100k limit 1;
> 48792.0702734375 <--should be 48792.0698
> It might be essentially the same problem as http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm
But since e2e test compare the results with mysql and seems mysql does it right, so it is
worthy fixing it in Hive.
> 
> 
> This addresses bug HIVE-3715.
>     https://issues.apache.org/jira/browse/HIVE-3715
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java
1423224 
>   http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java
1423224 
> 
> Diff: https://reviews.apache.org/r/8653/diff/
> 
> 
> Testing
> -------
> 
> I did test to compare the result with mysql default float precision setting, the result
is identical.
> 
> query:          select f, f*1.01, f/10 from all100k limit 1;
> mysql result:   48309       48792.0702734375    4830.898046875
> hive result:    48308.98    48792.0702734375	4830.898046875
> 
> 
> I apply this patch and run the hive e2e test, and the tests all pass (without this patch,
5 related failures)
> 
> 
> Thanks,
> 
> Johnny Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message