hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@math.unl.edu>
Subject map-tasks "beating up" a node
Date Thu, 13 Nov 2008 21:59:39 GMT
Hey all,

When we run a large task (lots of intermediate output), we have a real  
problem with it "beating up" nodes during the shuffle phase.  About  
half of our nodes are completely overwhelmed by the number of reducers  
requesting data to be copied (somewhere between 50 and 75 files appear  
to be open); this causes wait_io on the node to go up to about 50-75%  
of the CPU.  The number of active threads is about 45.  The amount of  
intermediate data is about 10GB per node.

Anyone else run into this problem?  Any hints? My first thought is to  
ask the user to use a combiner so there are less files to work with in  
the first place.


View raw message