hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Mishra <>
Subject Hive Query Unable to distribute load evenly in reducers
Date Mon, 15 Oct 2012 12:09:54 GMT
I am firing some hive queries joining tables containing upto 30millions records each. Since
the load on the reducers is very significant in these cases, i specifically set the following
parameters before executing the queries : 

set mapred.reduce.tasks=100;
set hive.exec.reducers.bytes.per.reducer=500000000;
set hive.optimize.cp=true;

The number of reducer the job spouts in now 160, but despite the high number most of the load
remains upon 1 or 2 reducers. Hence in the final statistics, 158 reducers go completed with
2-3 minutes of start and 2 reducers took 2 hrs to run.
Is there any way to overcome this load distribution disparity.
Any help in this regards will be highly appreciated.

Saurabh Mishra
View raw message