hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: hive on tez optimize MRR to MR?
Date Wed, 22 Apr 2015 09:11:27 GMT

To prevent bad reducer merging, the reducer merging only kicks in when the
optimizer thinks it gets a perf boost.

MR -> MRR is not a big win when it comes Tez, due to container-reuse -
going wide on the large cardinality in case of missing map-side
aggregation will be safer.

If and the userid set fits within memory, then smushing
the reducers would be nicer.

To reset the wide-narrow checks, do

set hive.optimize.reducededuplication.min.reducer=1;

But be aware that it will fail (I¹ve seen full disks) as you scale upwards
to the 10+ Tb cases.


On 4/22/15, 2:15 PM, "" <> wrote:

>select userid,count(*) from u_data group by userid order by userid
>will product MRR.
>I think when the result of  userid,count(*) is small(one reduce can
>process the result) . This query plan can optimize to MR ?

View raw message