hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yongqiang he <heyongqiang...@gmail.com>
Subject Re: skew join optimization
Date Mon, 21 Mar 2011 00:29:38 GMT
skew join does not work together with map join.

Map join does not require any reducer.
Please double check the hive that you use has the auto map join feature.
If there is auto covert join is your hive, only SET set
hive.auto.convert.join = true; should do the work.

thanks
yongqiang
On Sun, Mar 20, 2011 at 8:22 AM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> On Sun, Mar 20, 2011 at 11:20 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>> How about link to http://imageshack.us/ or TinyPic ?
>>
>> Thanks
>>
>> On Sun, Mar 20, 2011 at 7:56 AM, Edward Capriolo <edlinuxguru@gmail.com>
>> wrote:
>>>
>>> On Sun, Mar 20, 2011 at 10:30 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> > Can someone re-attach the missing figures for that wiki ?
>>> >
>>> > Thanks
>>> >
>>> > On Sun, Mar 20, 2011 at 7:15 AM, bharath vissapragada
>>> > <bharathvissapragada1990@gmail.com> wrote:
>>> >>
>>> >> Hi Igor,
>>> >>
>>> >> See http://wiki.apache.org/hadoop/Hive/JoinOptimization and see the
>>> >> jira 1642 which automatically converts a normal join into map-join
>>> >> (Otherwise you can specify the mapjoin hints in the query itself.).
>>> >> Because your 'S' table is very small , it can be replicated across all
>>> >> the mappers and the reduce phase can be avoided. This can greatly
>>> >> reduce the runtime .. (See the results section in the page for
>>> >> details.).
>>> >>
>>> >> Hope this helps.
>>> >>
>>> >> Thanks
>>> >>
>>> >>
>>> >> On Sun, Mar 20, 2011 at 6:37 PM, Jov <zhao6014@gmail.com> wrote:
>>> >> > 2011/3/20 Igor Tatarinov <igor@decide.com>:
>>> >> >> I have the following join that takes 4.5 hours (with 12 nodes)
>>> >> >> mostly
>>> >> >> because of a single reduce task that gets the bulk of the work:
>>> >> >> SELECT ...
>>> >> >> FROM T
>>> >> >> LEFT OUTER JOIN S
>>> >> >> ON T.timestamp = S.timestamp and T.id = S.id
>>> >> >> This is a 1:0/1 join so the size of the output is exactly the
same
>>> >> >> as
>>> >> >> the
>>> >> >> size of T (500M records). S is actually very small (5K).
>>> >> >> I've tried:
>>> >> >> - switching the order of the join conditions
>>> >> >> - using a different hash function setting (jenkins instead
of
>>> >> >> murmur)
>>> >> >> - using SET set hive.auto.convert.join = true;
>>> >> >
>>> >> > are you sure your query convert to mapjoin? if not,try use explicit
>>> >> > mapjoin hint.
>>> >> >
>>> >> >
>>> >> >> - using SET hive.optimize.skewjoin = true;
>>> >> >> but nothing helped :(
>>> >> >> Anything else I can try?
>>> >> >> Thanks!
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Regards,
>>> >> Bharath .V
>>> >> w:http://research.iiit.ac.in/~bharath.v
>>> >
>>> >
>>>
>>> The wiki does not allow images, confluence does but we have not moved
>>> their yet.
>>
>>
>
> Not a bad option, but lame that the foundation with the most popular
> web server in the world has to host our images elsewhere :(
>

Mime
View raw message