hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-3607) Allow users to choose between failing the daemons vs failing the apps/containers
Date Fri, 08 May 2015 20:36:00 GMT
Karthik Kambatla created YARN-3607:
--------------------------------------

             Summary: Allow users to choose between failing the daemons vs failing the apps/containers
                 Key: YARN-3607
                 URL: https://issues.apache.org/jira/browse/YARN-3607
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: nodemanager, resourcemanager, scheduler
    Affects Versions: 2.7.0
            Reporter: Karthik Kambatla
            Assignee: Karthik Kambatla


We often run into cases where we are faced with the option of failing the daemon (fail-fast)
vs failing user's work and keep the cluster running. There is no clear right way to handle
these situations - some users would like to be conservative and let the daemons run, while
others would like to fail-fast. 

Today, we handle these case-by-case and go by what the people working on it feel is the right
way to handle things. Examples include how we handle app recovery failures, queue-changes
on RM restart. 

Users should be able to choose between these two extremes, and have all these situations handled
the same way. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message