hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bharath vissapragada <bharathvissapragada1...@gmail.com>
Subject Re: skew join optimization
Date Sun, 20 Mar 2011 14:15:54 GMT
Hi Igor,

See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
jira 1642 which automatically converts a normal join into map-join
(Otherwise you can specify the mapjoin hints in the query itself.).
Because your 'S' table is very small , it can be replicated across all
the mappers and the reduce phase can be avoided. This can greatly
reduce the runtime .. (See the results section in the page for
details.).

Hope this helps.

Thanks


On Sun, Mar 20, 2011 at 6:37 PM, Jov <zhao6014@gmail.com> wrote:
> 2011/3/20 Igor Tatarinov <igor@decide.com>:
>> I have the following join that takes 4.5 hours (with 12 nodes) mostly
>> because of a single reduce task that gets the bulk of the work:
>> SELECT ...
>> FROM T
>> LEFT OUTER JOIN S
>> ON T.timestamp = S.timestamp and T.id = S.id
>> This is a 1:0/1 join so the size of the output is exactly the same as the
>> size of T (500M records). S is actually very small (5K).
>> I've tried:
>> - switching the order of the join conditions
>> - using a different hash function setting (jenkins instead of murmur)
>> - using SET set hive.auto.convert.join = true;
>
> are you sure your query convert to mapjoin? if not,try use explicit
> mapjoin hint.
>
>
>> - using SET hive.optimize.skewjoin = true;
>> but nothing helped :(
>> Anything else I can try?
>> Thanks!
>



-- 
Regards,
Bharath .V
w:http://research.iiit.ac.in/~bharath.v

Mime
View raw message