hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yin Huai <>
Subject Re: single MR stage for join and group by
Date Fri, 02 Aug 2013 04:14:02 GMT
If the join is a reduce side join, will optimize this query
and generate a single MR job. The optimizer introduced by HIVE-2206 is in
trunk. Currently, it only handles the same column(s).

If the join is a MapJoin, hive 0.11 can generate a single MR job (In this
case, if join and group by use the same column(s) does not matter). To
enable it, you need to ...
set hive.optimize.mapjoin.mapreduce=true;
and also make sure is larger
than the size of the small table.
For hive trunk, drops the
flag of "hive.optimize.mapjoin.mapreduce". So, in future release, you will
not need to set hive.optimize.mapjoin.mapreduce.



On Thu, Aug 1, 2013 at 5:32 PM, Stephen Sprague <> wrote:

> and what version of hive are you running your test on?  i do believe - not
> certain - that hive 0.11 includes the optimization you seek.
> On Thu, Aug 1, 2013 at 10:19 AM, Chen Song <> wrote:
>> Suppose we have 2 simple tables
>> A
>> id int
>> value string
>> B
>> id
>> When hive translates the following query
>> select max(A.value), from A join B on = group by;
>> It launches 2 stages, one for the join and one for the group by.
>> My understanding is that if the join key set is a sub set of the group by
>> key set, it can be achieved in the same map reduce job. If that is correct
>> in theory, could it be a feature in hive?
>> Chen

View raw message