hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "r7raul1984@163.com" <r7raul1...@163.com>
Subject Re: Re: join on different data type
Date Mon, 04 May 2015 07:53:58 GMT
Thank you!



r7raul1984@163.com
 
From: Gopal Vijayaraghavan
Date: 2015-05-04 16:10
To: user
CC: r7raul1984@163.com
Subject: Re: join on different data type
 
> If A.col1 is of DOUBLE type,
> but B.col2 is of BIGINT,
 
 
The automatic conversion is not acceptable according to the java language
spec (section 5.1.2)
 
https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html#jls-5.1.2
 
 
Also to be noted here is that in general, that even if you cast, you might
be casting the wrong way around.
 
Because joins on double columns will give incorrect (rather unintended,
but IEEE 754 correct) results when comparing byte serialized
representations - because of the nearly-equal property epsilon.
 
Easiest way to demonstrate this is to try the simplest off-by-epsilon case
(say, in python)
 
>>> import sys
>>> 0.1 + 0.2
0.30000000000000004
>>> 0.1 + 0.2 > 0.3
 
True
>>> 
>>> ((0.1+0.2) - 0.3) < sys.float_info.epsilon
True
 
 
So if the RHS produced ETL values by sum() and the LHS was produced by
parsing log text, the JOIN will output zero rows.
 
If you want to do equijoins like that, the only valid case is to cast both
to fixed precision bigints (say, convert all dollars to cents, by *100),
not both to double.
 
Cheers,
Gopal
 
 
Mime
View raw message