hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Grover <mgro...@oanda.com>
Subject Re: Hive Reducers hanging - interesting problem - skew ?
Date Mon, 05 Dec 2011 22:09:03 GMT
jS,
Check out if this helps:
http://search-hadoop.com/m/l1usr1MAHX32&subj=Re+Severely+hit+by+curse+of+last+reducer+



Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "john smith" <js1987.smith@gmail.com>
To: user@hive.apache.org
Sent: Monday, December 5, 2011 4:38:14 PM
Subject: Hive Reducers hanging - interesting problem - skew ?

Hi list, 

I am trying to run a Join query on my 10 node cluster. My query looks as follows 

select * from A JOIN B on (A.a = B.b) 

size of A = 15 million rows 
size of B = 1 million rows 

The problem is A.a and B.b has around 25-30 distinct values per column which implies that
they have high selectivities and the reducers are bulky. 

However the performance hit is so horrible that , ALL my reducers hang @ 75% for 6 hours and
doesn't move further. 

The only thing that log shows up is "Join operator - forwarding rows ---------------<Huge
number>" kinds of logs for all this long. What does this mean ? 
There is no swapping happening and the CPU % is constantly around 40% for all this time (observed
through Ganglia) . 

Any way I can solve this problem? Can anyone help me with this? 

Thanks, 
jS 



Mime
View raw message