hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <daniel.ha...@veracity-group.com>
Subject Insert into dynamic partitions performance
Date Sat, 06 Dec 2014 14:27:23 GMT
Hi,
I'm executing an insert statement that goes over 1TB of data.
The map phase goes well but the reduce stage only used one reducer which becomes a great bottleneck.

 I've tried to set the number of reducers to four and added a distribute by clause to the
statement but I'm still using just one reducer.

How can I increase the reducer's parallelism?

Thanks,
Daniel
Mime
View raw message