hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Slow shuffle stage?
Date Fri, 11 Nov 2011 01:20:17 GMT
What sorts of causes might be responsible for a long or slow shuffle stage?  For example, I
have a job of 266 maps (each emitting 4 records) and 17 reduces (each ingesting about 60 records)
that takes 72 minutes to complete.  The maps tend to run in about 9-13 minutes (the value
in parentheses under the Finish Time column of the map task list in the job tracker and the
reduces run in about 37 minutes (same column).  If I click into a specific reduce task, I
see a Finish Time of 37 minutes of course, and a Shuffle time of about 27 minutes.

So, 11 minutes were spent in the maps, 10 in the reduces, and 27 shuffling.  Note that the
72 minute overall job time is considerably longer than the sum of these three averages because
of a few outlier maps (25 minutes, one even took 37 minutes) that held up the later stages).

Disregarding the outliers, it's still spending more than 50% of the job time (27 out of 48
minutes) shuffling instead of doing actual computation in the maps and reducers.  This feels
inefficient to me.

What causes this and what can be done to improve it?


Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda

View raw message