hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4079) Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly in daemons
Date Tue, 25 Aug 2015 15:09:46 GMT

     [ https://issues.apache.org/jira/browse/YARN-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Saxena updated YARN-4079:
-------------------------------
    Description: 
Currently in all daemons this config is explicitly set to true so that daemons can crash instead
of hanging around. While this seems to be correct, as a  recoverable exception should be caught
and handled and NOT leaked through to AsyncDispatcher. And a non recoverable one should lead
to a crash anyways.

But this can make system more fragile in case we miss to catch all recoverable exceptions.

Currently we do not even have an option of setting it to false in configuration, even if we
would want. 

Probably we can read this value from configuration and set it to true in daemons if not configured.
This way in production clusters if there is an exception which is leading to the daemon crashing
frequently and we find that its unavoidable but not a very big issue(i.e daemon can still
work normally for most part), we can atleast set the configuration to false in config file.

> Retrospect on the decision of making yarn.dispatcher.exit-on-error as true explicitly
in daemons
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4079
>                 URL: https://issues.apache.org/jira/browse/YARN-4079
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.7.1
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>
> Currently in all daemons this config is explicitly set to true so that daemons can crash
instead of hanging around. While this seems to be correct, as a  recoverable exception should
be caught and handled and NOT leaked through to AsyncDispatcher. And a non recoverable one
should lead to a crash anyways.
> But this can make system more fragile in case we miss to catch all recoverable exceptions.
> Currently we do not even have an option of setting it to false in configuration, even
if we would want. 
> Probably we can read this value from configuration and set it to true in daemons if not
configured.
> This way in production clusters if there is an exception which is leading to the daemon
crashing frequently and we find that its unavoidable but not a very big issue(i.e daemon can
still work normally for most part), we can atleast set the configuration to false in config
file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message