mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Hindman (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-106) Failover timeout should default to 0
Date Mon, 19 Dec 2011 19:07:32 GMT

    [ https://issues.apache.org/jira/browse/MESOS-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172520#comment-13172520
] 

Benjamin Hindman commented on MESOS-106:
----------------------------------------

Actually, it would be sweet if we made that a constant in master/constants.hpp so that all
constants are defined there instead of at the actual configuration 'get' sites.
                
> Failover timeout should default to 0
> ------------------------------------
>
>                 Key: MESOS-106
>                 URL: https://issues.apache.org/jira/browse/MESOS-106
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>         Attachments: MESOS-106.patch
>
>
> Since the failover timeout was added, you get a lot of weird behavior in clusters running
frameworks that don't support failover due to its long default value of 1 day. If a framework
fails or just exits without calling driver.stop(), all its executors stay around and consume
resources on the machines, causing subsequent runs to mysteriously fail to acquire resources.
See http://groups.google.com/group/spark-users/msg/553af12424e4ed3d for an example. I know
that the failover timeout is supposed to eventually become a per-framework parameter anyway,
but in the meantime, the easiest way to prevent this is to set it to 0, because almost no
users have failover-enabled frameworks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message