hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Re: Slow shuffle stage?
Date Fri, 11 Nov 2011 15:57:15 GMT
Oh, so the reported shuffle time incorporate the mapper time.  I suppose that makes sense since
the two stages overlap.  That explains a lot.

As to the map times, it mostly due to task attempt failures.  Any attempt that fails costs
a lot of time.  It is almost never a data-driven error, so subsequent attempts succeed, but
it costs time when the earlier attempts fail.  The cause is usually some cluster hairiness
that I can neither control (I'm not the administrator) or barely comprehend: failure to retrieve
blocks, random err 141 and 139 failures, that sort of thing.

I'll check on the speculative execution.  I can't remember...I think so though.

On Nov 11, 2011, at 5:53 AM, Joey Echeverria wrote:

> Another thing to look at is the map outlier. The shuffle will start by default when 5%
of the maps are done, but won't finish until after the last map is done. Since one of your
maps took 37 minutes, your shuffle take at least that long.
> I would check the following:
> Is the input skewed?
> Does the same node always result in the 37 minute map task?
> Is speculative execution turned on?
> -Joey
> On Nov 10, 2011, at 22:07, Prashant Sharma <prashant.iiith@gmail.com> wrote:
>> Can you tell us about your cluster, Is it single node? how big is your data
>> then.? Or the bandwidth between nodes. (cause copy might take time in that
>> case)
>> -P
>> On Fri, Nov 11, 2011 at 6:50 AM, Keith Wiley <kwiley@keithwiley.com> wrote:
>>> What sorts of causes might be responsible for a long or slow shuffle
>>> stage?  For example, I have a job of 266 maps (each emitting 4 records) and
>>> 17 reduces (each ingesting about 60 records) that takes 72 minutes to
>>> complete.  The maps tend to run in about 9-13 minutes (the value in
>>> parentheses under the Finish Time column of the map task list in the job
>>> tracker and the reduces run in about 37 minutes (same column).  If I click
>>> into a specific reduce task, I see a Finish Time of 37 minutes of course,
>>> and a Shuffle time of about 27 minutes.
>>> So, 11 minutes were spent in the maps, 10 in the reduces, and 27
>>> shuffling.  Note that the 72 minute overall job time is considerably longer
>>> than the sum of these three averages because of a few outlier maps (25
>>> minutes, one even took 37 minutes) that held up the later stages).
>>> Disregarding the outliers, it's still spending more than 50% of the job
>>> time (27 out of 48 minutes) shuffling instead of doing actual computation
>>> in the maps and reducers.  This feels inefficient to me.
>>> What causes this and what can be done to improve it?
>>> Thanks.

Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
                                           --  Keith Wiley

View raw message