hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajo Fod <ajo....@gmail.com>
Subject Re: Mapjoin Usage Question
Date Thu, 20 Jan 2011 14:23:34 GMT
It probably depends on how big the big table is ... I mean if it can
be held in memory.

-Ajo

On Wed, Jan 19, 2011 at 11:23 PM, hadoop n00b <new2hive@gmail.com> wrote:
> Thanks Leo,
>
> Does the smaller table go into the mapjoin hint? Actually, when I ran a test
> query with the bigger table in the hint, it performed better.
>
> On Thu, Jan 20, 2011 at 12:40 PM, Leo Alekseyev <dnquark@gmail.com> wrote:
>>
>> You can only specify one table, and make sure to include its name,
>> i.e. /*+ mapjoin(t2)*/.   For more info see
>> http://wiki.apache.org/hadoop/Hive/JoinOptimization and
>> http://www.slideshare.net/aiolos127/join-optimization-in-hive.
>>
>> Also, you are using a relatively old version of Hive, but I'll let
>> more experienced people on this list decide whether that's a problem
>> :)
>>
>> On Thu, Jan 20, 2011 at 2:00 AM, hadoop n00b <new2hive@gmail.com> wrote:
>> > Hi,
>> >
>> > How do I use the mapjoin hint in a query.
>> >
>> > Say, I have two tables t1 and t2 where t2 is the smaller table. Do I
>> > specify
>> > t2 in the mapjoin hint?
>> >
>> > select /*+ mapjoin(b)*/ * from t1 join t2 b on (a.id = b.id)
>> >
>> > If I am joining two smaller tables, can I specify two clauses in the
>> > mapjoin? /*+mapjoin(b,c)*/?
>> >
>> > I am unable to find much documentation on this. I am using CDH2 with
>> > Hive
>> > 0.4.1
>> >
>> > Thanks!
>
>

Mime
View raw message