hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <daniel.ha...@veracity-group.com>
Subject Re: Insert into dynamic partitions performance
Date Sun, 07 Dec 2014 07:17:45 GMT
I see.
Thanks a lot that's very helpful!

Daniel

> On 7 בדצמ׳ 2014, at 09:10, Gopal V <gopalv@apache.org> wrote:
> 
>> On 12/6/14, 10:11 PM, Daniel Haviv wrote:
>> 
>> Isn't there a way to make hive allocate more than one reducer for the whole job?
Maybe one
>> per partition.
> 
> Yes.
> 
> hive.optimize.sort.dynamic.partition=true; does nearly that.
> 
> It raises the net number of useful reducers to total-num-of-partitions x total-num-buckets.
> 
> If you have say, data being written into six hundred partitions with 1 bucket each, it
can use anywhere between 1 and 600 reducers (hashcode collisions causing skews, of course).
> 
> It's turned off by default, because it really slows down the 1 partition without buckets
insert speed.
> 
> Cheers,
> Gopal
> 
>>>> On 7 בדצמ׳ 2014, at 06:06, Gopal V <gopalv@apache.org> wrote:
>>>> 
>>>> On 12/6/14, 6:27 AM, Daniel Haviv wrote:
>>>> Hi,
>>>> I'm executing an insert statement that goes over 1TB of data.
>>>> The map phase goes well but the reduce stage only used one reducer which
becomes
>> a great bottleneck.
>>> 
>>> Are you inserting into a bucketed or sorted table?
>>> 
>>> If the destination table is bucketed + partitioned, you can use the dynamic partition
>> sort optimization to get beyond the single reducer.
>>> 
>>> Cheers,
>>> Gopal
> 

Mime
View raw message