spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Susan X. Huynh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21419) Support Mesos failover_timeout in driver (Mesos cluster mode)
Date Tue, 18 Jul 2017 14:52:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091657#comment-16091657
] 

Susan X. Huynh commented on SPARK-21419:
----------------------------------------

I split this into two sub-tasks: (1) making the failover_timeout configurable and (2) adding
an explicit teardown to cases where we currently rely on the master to timeout immediately
and do the teardown.

> Support Mesos failover_timeout in driver (Mesos cluster mode)
> -------------------------------------------------------------
>
>                 Key: SPARK-21419
>                 URL: https://issues.apache.org/jira/browse/SPARK-21419
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos
>    Affects Versions: 2.2.0
>            Reporter: Susan X. Huynh
>
> In Mesos cluster mode, the driver framework's failover_timeout is currently set to zero.
This means that if the driver temporarily loses connectivity with the master, the driver is
considered disconnected, and the master will immediately kill all tasks and executors associated
with the framework.
> To avoid this behavior, I would like to make this failover_timeout configurable. A user
could then set it to a non-zero value, so that during a temporary disconnection the master
would wait before tearing down the framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message