hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maysam Yabandeh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
Date Sat, 01 Jun 2013 04:58:21 GMT

     [ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Maysam Yabandeh updated YARN-713:

    Attachment: YARN-713.patch

In the attached patch, the exception is handled in RMContainerTokenSecretManager#createContainerToken
by returning null. The null values are supposed to trigger a try, as in FifoScheduler#assignContainer:

        if (containerToken == null) {
          return i; // Try again later.
Regarding the sweep of RM to find other places that a DNS failure should be handled properly,
I guess a cleaner approach is to directly throw UnknownHostException instead of hiding it
in a InvalidArgumentException, which is also semantically confusing. This however would result
in widespread changes allover the project, as each user of SecurityUtil must either handle
the exception or declare it to be caught by its callers. If this approach is fine with you
guys, I can give it a go.
> ResourceManager can exit unexpectedly if DNS is unavailable
> -----------------------------------------------------------
>                 Key: YARN-713
>                 URL: https://issues.apache.org/jira/browse/YARN-713
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Jason Lowe
>            Priority: Critical
>         Attachments: YARN-713.patch, YARN-713.patch
> As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to
an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would
cause the RM to exit.  The RM should not exit during DNS hiccups.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message