hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Vasudev <vvasu...@apache.org>
Subject Re: mapper re-run behavior when node dies
Date Sun, 01 May 2016 09:13:54 GMT
Hi Alvin,

If the node dies before all the reducers can fetch the data for all the maps that ran on the
node, the maps on that node will be re-run. If the data for the maps has been copied, you’re

You should look at TooManyFetchFailureTransition in TaskAttemptImpl and handleUpdatedNodes
in RMContainerAllocator for details on the implementation.


From:  Alvin Chyan <alvin.chyan@turn.com>
Date:  Wednesday, April 13, 2016 at 12:02 AM
To:  <user@hadoop.apache.org>
Subject:  mapper re-run behavior when node dies

Hi all,
If a mapper finished and the output has already been copied to a reducer, and then the node
dies, do the mappers that outputted data to the local node have to be re-run?

The use case is running mappers on flaky/preemptible nodes. In a cloud environment, preemptible
nodes can be much cheaper. If the mappers always have to re-run when the node dies, then we
may never make progress as machines can take turns dying and take the output of all map tasks
that ran on that node with it.

I have done some testing, and it seems that if the reducer has finished the shuffle and has
started reducing, then it doesn't matter if the mapper node died. However, while the reducer
was still in copy phase, I killed another machine and then a bunch of mappers had to re-run.
I'm not sure how to confirm that the data had been successfully copied to reducer though.

Answers or pointers to relevant portions of the code would be greatly appreciated.

Alvin Chyan
Sr. Software Engineer, Data
901 Marshall St, Suite 200, Redwood City, CA 94063

turn.com   |   @TurnPlatform

This message is Turn Confidential, except for information included that is already available
to the public. If this message was sent to you accidentally, please delete it.

View raw message