spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Cheah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-8167) Tasks that fail due to YARN preemption can cause job failure
Date Wed, 24 Jun 2015 19:43:05 GMT

    [ https://issues.apache.org/jira/browse/SPARK-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600012#comment-14600012
] 

Matt Cheah commented on SPARK-8167:
-----------------------------------

What's curious here as I'm trying to design this is that it's not immediately obvious how
to transfer the exit code of the executor from the remote machine back to the driver. If the
Executor dies, the driver immediately sees the connection as dropped and just removes the
Executor without question as to what the exit code was; it is hard to know what the exit code
is in YARN mode in particular.

Does anyone have any thoughts as to how to get the exit code of the executor to the driver,
in yarn-client mode?

> Tasks that fail due to YARN preemption can cause job failure
> ------------------------------------------------------------
>
>                 Key: SPARK-8167
>                 URL: https://issues.apache.org/jira/browse/SPARK-8167
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler, YARN
>    Affects Versions: 1.3.1
>            Reporter: Patrick Woody
>            Assignee: Matt Cheah
>            Priority: Blocker
>
> Tasks that are running on preempted executors will count as FAILED with an ExecutorLostFailure.
Unfortunately, this can quickly spiral out of control if a large resource shift is occurring,
and the tasks get scheduled to executors that immediately get preempted as well.
> The current workaround is to increase spark.task.maxFailures very high, but that can
cause delays in true failures. We should ideally differentiate these task statuses so that
they don't count towards the failure limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message