hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jov <zhao6...@gmail.com>
Subject Re: skew join optimization
Date Sun, 20 Mar 2011 13:07:04 GMT
2011/3/20 Igor Tatarinov <igor@decide.com>:
> I have the following join that takes 4.5 hours (with 12 nodes) mostly
> because of a single reduce task that gets the bulk of the work:
> SELECT ...
> FROM T
> LEFT OUTER JOIN S
> ON T.timestamp = S.timestamp and T.id = S.id
> This is a 1:0/1 join so the size of the output is exactly the same as the
> size of T (500M records). S is actually very small (5K).
> I've tried:
> - switching the order of the join conditions
> - using a different hash function setting (jenkins instead of murmur)
> - using SET set hive.auto.convert.join = true;

are you sure your query convert to mapjoin? if not,try use explicit
mapjoin hint.


> - using SET hive.optimize.skewjoin = true;
> but nothing helped :(
> Anything else I can try?
> Thanks!

Mime
View raw message