hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: STREAMTABLE And MAPJOIN
Date Tue, 03 Dec 2013 09:40:06 GMT
This is my understanding of both. Wait for the hive guru's to correct me if
i made any mistake


In Hive, when an inner join query happens the table at the last position on
the right streams its records to the reducers. This is the default
behavior.

So say, you have a query select blah blah from t1 join t2 join t3 join t4
on (blah blah)
all the maps emitting key values on table t1, t2, t3 just send it to
reducers and are bufferred in memory but for table t4 it streams the
records to the reducer for better memory management and thats why its
advised that you have largest table on the right

This default behavior is changed by STREAMTABLE(t1) where you can tell
which table data you want to be streamed.

On the other hand, mapjoin is a concept where there are no reducers are
involved. Its a join where the smaller table is buffered into memory of
each map and then the joins are performed by the maps itself. As the
smaller table data is available in memory, map jobs are very fast as the
reduce step is completely removed.


On Tue, Dec 3, 2013 at 2:47 PM, Baahu <bahubali@gmail.com> wrote:

> Hi,
> What is the difference between hints STREAMTABLE ,MAPJOIN .
>
> Thanks,
> Baahu
>
>


-- 
Nitin Pawar

Mime
View raw message