hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4721) RM to try to auth with HDFS on startup, retry with max diagnostics on failure
Date Fri, 04 Mar 2016 21:45:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180595#comment-15180595
] 

Vinod Kumar Vavilapalli commented on YARN-4721:
-----------------------------------------------

bq. I don't know what the policy should be if the RM can't auth to HDFS at this point.
By design, (most of) RM is agnostic of file-systems.

bq. Instead, the RM could try to talk to HDFS on launch, ls / should suffice. If it can't
auth, it can then tell UGI to log more and retry.
There are only a couple of places where there are run-time dependencies (a) User passes HDFS
delegation-tokens for auto-renewal (b) Some of the generic-history / Timeline-Service implementations
are file-system based. But they are at run-time and we should actively avoid any static dependencies
like "ls /".

I don't understand the patch completely, but it seems like you are adding extra-validation
checks to make sure that RM can authenticate successfully with *kerberos* (and log diagnostics
in case of failures) and not HDFS itself specifically. If I am getting that right, it should
be okay to do such diagnostics.

> RM to try to auth with HDFS on startup, retry with max diagnostics on failure
> -----------------------------------------------------------------------------
>
>                 Key: YARN-4721
>                 URL: https://issues.apache.org/jira/browse/YARN-4721
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-12889-001.patch
>
>
> If the RM can't auth with HDFS, this can first surface during job submission, which can
cause confusion about what's wrong and whose credentials are playing up.
> Instead, the RM could try to talk to HDFS on launch, {{ls /}} should suffice. If it can't
auth, it can then tell UGI to log more and retry.
> I don't know what the policy should be if the RM can't auth to HDFS at this point. Certainly
it can't currently accept work. But should it fail fast or keep going in the hope that the
problem is in the KDC or NN and will fix itself without an RM restart?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message