hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <tombrow...@gmail.com>
Subject Re: Hive double-precision question
Date Tue, 18 Dec 2012 02:08:44 GMT
Doubles are not perfect fractional numbers. Because of rounding errors, a
set of doubles added in different orders can produce different results
(e.g., a+b+c != b+c+a)

Because of this, if your computation is happening in a different order
locally than on the hive server, you might end up with different results.

I don't think hive supports a native decimal type, unfortunately, so it's
difficult to verify this.

--Tom

On Monday, December 17, 2012, Johnny Zhang wrote:

> Hi, Periya:
> Can you take a look at the patch of
> https://issues.apache.org/jira/browse/HIVE-3715 and see if you can apply
> the similar change to make sinc/cons more accurate for your use case? Feel
> free to comments on the jira as well. Thanks.
>
> Johnny
>
>
> On Sat, Dec 8, 2012 at 11:23 AM, Periya.Data <periya.data@gmail.com<javascript:_e({},
'cvml', 'periya.data@gmail.com');>
> > wrote:
>
>> Hi Lauren and Zhang,
>>     The book "Programming Hive" suggests to use Double (instead of float)
>> and also to cast any literal value to double. I am already using double for
>> all my computations (both in hive table schema as well as in my UDF).
>> Furthermore, I am not comparing two floats/doubles. I am doing some
>> computations involving doubles...and those minor differences are adding up.
>>
>> It looks like what Mark Grover was telling - mapping between Java
>> datatypes to Hive data-types. I am yet to look at that portion of the
>> source-code.
>>
>> Thanks and will keep you posted,
>> /PD
>>
>>
>>
>>  On Fri, Dec 7, 2012 at 2:12 PM, Lauren Yang <Lauren.Yang@microsoft.com>wrote:
>>
>>  This sounds like https://issues.apache.org/jira/browse/HIVE-2586 ,
>> where comparing float/doubles will not work because of the way floating
>> point numbers are represented.****
>>
>> ** **
>>
>> Perhaps there is a comparison between a  float and double type because of
>> some internal representation in the Java library, or the UDF.****
>>
>> ** **
>>
>> Ed Capriolo’s book has a good section about workarounds and caveats for
>> working with floats/doubles in hive.****
>>
>> ** **
>>
>> Thanks,****
>>
>> Lauren****
>>
>> *From:* Periya.Data [mailto:periya.data@gmail.com]
>> *Sent:* Friday, December 07, 2012 1:28 PM
>> *To:* user@hive.apache.org; cdh-user@cloudera.org
>> *Subject:* Hive double-precision question****
>>
>> ** **
>>
>> Hi Hive Users,
>>     I recently noticed an interesting behavior with Hive and I am unable
>> to find the reason for it. Your insights into this is much appreciated.
>>
>> I am trying to compute the distance between two zip codes. I have the
>> distances computed in various 'platforms' - SAS, R, Linux+Java, Hive UDF
>> and using Hive's built-in functions. There are some discrepancies from the
>> 3rd decimal place when I see the output got from using Hive UDF and Hive's
>> built-in functions. Here is an example:
>>
>> zip1          zip 2          Hadoop Built-in function
>> SAS                      R                                       Linux +
>> Java****
>>
>> 00501  ****
>>
>> 11720  ****
>>
>> 4.49493083698542000****
>>
>> 4.49508858****
>>
>> 4.49508858054005****
>>
>> --
>>
>>
>>
>>
>
>

Mime
View raw message