hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal V <>
Subject Re: Insert into dynamic partitions performance
Date Sun, 07 Dec 2014 07:10:36 GMT
On 12/6/14, 10:11 PM, Daniel Haviv wrote:

> Isn't there a way to make hive allocate more than one reducer for the whole job? Maybe
> per partition.


hive.optimize.sort.dynamic.partition=true; does nearly that.

It raises the net number of useful reducers to total-num-of-partitions x 

If you have say, data being written into six hundred partitions with 1 
bucket each, it can use anywhere between 1 and 600 reducers (hashcode 
collisions causing skews, of course).

It's turned off by default, because it really slows down the 1 partition 
without buckets insert speed.


>> On 7 בדצמ׳ 2014, at 06:06, Gopal V <> wrote:
>>> On 12/6/14, 6:27 AM, Daniel Haviv wrote:
>>> Hi,
>>> I'm executing an insert statement that goes over 1TB of data.
>>> The map phase goes well but the reduce stage only used one reducer which becomes
> a great bottleneck.
>> Are you inserting into a bucketed or sorted table?
>> If the destination table is bucketed + partitioned, you can use the dynamic partition
> sort optimization to get beyond the single reducer.
>> Cheers,
>> Gopal

View raw message