hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Low performance map join when join key types are different
Date Wed, 23 Dec 2015 06:24:18 GMT

> But why disable mapjoin has better performance when we don't use cast to
>string(user always lazy)?
> 
> Join key values comparison in  in reduce stage is more quickly?

The HashMap<DoubleWritable, RowContainer> is slower than the full-sort +
sorted-merge-join.


It shouldn't be, but it hits the worst-case performance for the Hashmap
impl because of a bug in DoubleWritable in Hadoop.

The effect is somewhat the same as

public int hashCode() {
   return 1;
}

Read the comments on - https://issues.apache.org/jira/browse/HADOOP-12217

Cheers,
Gopal







Mime
View raw message