hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: Low performance map join when join key types are different
Date Wed, 23 Dec 2015 06:24:18 GMT

> But why disable mapjoin has better performance when we don't use cast to
>string(user always lazy)?
> Join key values comparison in  in reduce stage is more quickly?

The HashMap<DoubleWritable, RowContainer> is slower than the full-sort +

It shouldn't be, but it hits the worst-case performance for the Hashmap
impl because of a bug in DoubleWritable in Hadoop.

The effect is somewhat the same as

public int hashCode() {
   return 1;

Read the comments on -


View raw message