hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alvin Chyan <alvin.ch...@turn.com>
Subject Re: mapper re-run behavior when node dies
Date Mon, 09 May 2016 17:15:53 GMT
Thanks for the reply, Varun! I quickly got lost in the code since I
couldn't figure out which implementation of the interface was being passed
into the context.

>From your comment, it sounds like ALL the maps that ran on that node need
to have their output copied before a map is safe from having to be re-run.
Thus, even if 95% of the mappers had their output copied before the node
died, all 100% of the mappers have to re-run. If that's the case, then it
does seem like there's room for optimization.

Thanks!


*Alvin Chyan*Sr. Software Engineer, Data
901 Marshall St, Suite 200, Redwood City, CA 94063


turn.com <http://www.turn.com/>   |   @TurnPlatform
<https://twitter.com/@TurnPlatform>

This message is Turn Confidential, except for information included that is
already available to the public. If this message was sent to you
accidentally, please delete it.

On Sun, May 1, 2016 at 2:13 AM, Varun Vasudev <vvasudev@apache.org> wrote:

> Hi Alvin,
>
> If the node dies before all the reducers can fetch the data for all the
> maps that ran on the node, the maps on that node will be re-run. If the
> data for the maps has been copied, you’re fine.
>
> You should look at TooManyFetchFailureTransition in TaskAttemptImpl
> and handleUpdatedNodes in RMContainerAllocator for details on the
> implementation.
>
> -Varun
>
> From: Alvin Chyan <alvin.chyan@turn.com>
> Date: Wednesday, April 13, 2016 at 12:02 AM
> To: <user@hadoop.apache.org>
> Subject: mapper re-run behavior when node dies
>
> Hi all,
> If a mapper finished and the output has already been copied to a reducer,
> and then the node dies, do the mappers that outputted data to the local
> node have to be re-run?
>
> The use case is running mappers on flaky/preemptible nodes. In a cloud
> environment, preemptible nodes can be much cheaper. If the mappers always
> have to re-run when the node dies, then we may never make progress as
> machines can take turns dying and take the output of all map tasks that ran
> on that node with it.
>
> I have done some testing, and it seems that if the reducer has finished
> the shuffle and has started reducing, then it doesn't matter if the mapper
> node died. However, while the reducer was still in copy phase, I killed
> another machine and then a bunch of mappers had to re-run. I'm not sure how
> to confirm that the data had been successfully copied to reducer though.
>
> Answers or pointers to relevant portions of the code would be greatly
> appreciated.
>
> Thanks!
>
>
> *Alvin Chyan*Sr. Software Engineer, Data
> 901 Marshall St, Suite 200, Redwood City, CA 94063
>
>
> turn.com <http://www.turn.com/>   |   @TurnPlatform
> <https://twitter.com/@TurnPlatform>
>
> This message is Turn Confidential, except for information included that is
> already available to the public. If this message was sent to you
> accidentally, please delete it.
>

Mime
View raw message