hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Low performance map join when join key types are different
Date Tue, 22 Dec 2015 18:36:42 GMT

> We found that when we join on two different type keys , hive will
>convert all join key to Double.

This is because of type coercions for BaseCompare, so that String:Integer
comparisons with "<=" will work similarly to "=".

> b.id to double. When the conversion occurs, map join will become very
>slow.
...
> Does anyone how to solve it more effectively?

This is an issue that only affects mapreduce mode in Hive. The broadcast
joins in Tez switched to Murmur hash to avoid this issue (HIVE-6924 +
HIVE-7121).

As a workaround, you can insert explicit casts to String to make this
faster.

Cheers,
Gopal



Mime
View raw message