hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
Date Fri, 06 Feb 2015 01:37:34 GMT

    [ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308422#comment-14308422

Vinod Kumar Vavilapalli commented on YARN-3021:

Though the patch unblocks the jobs in the short term, it seems like long term this is still
bad. Applications that want to run for longer than 7 days in such setups will just fail without
any other way.

May be the solution is the following:
 - Explicitly have an external renewer system that has the right permissions to renew these
tokens. Working with such an external renewer system needs support in frameworks, for e.g.
in MapReduce, a renewal server list similar to mapreduce.job.hdfs-servers.
 - RM can simply inspect the incoming renewer specified in the token and skip renewing those
tokens if the renewer doesn't match it's own address. This way, we don't need an explicit
API in the submission context.

Apologies for going back and forth on this one. Does that work? /cc [~jianhe], [~kasha].

Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not
work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip
adding the app itself to DTR, the app will be completely stuck.

> YARN's delegation-token handling disallows certain trust setups to operate properly
> -----------------------------------------------------------------------------------
>                 Key: YARN-3021
>                 URL: https://issues.apache.org/jira/browse/YARN-3021
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 2.3.0
>            Reporter: Harsh J
>         Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts
COMMON (one way trusts both), and both A and B run HDFS + YARN clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to
access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…)
synchronously during application submission (to validate the managed token before it adds
it to a scheduler for automatic renewal). The call obviously fails cause B realm will not
trust A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously and once
the renewal attempt failed we simply ceased to schedule any further attempts of renewals,
rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on the failure
and skip the scheduling alone, rather than bubble back an error to the client, failing the
app submission. This way the old behaviour is retained.

This message was sent by Atlassian JIRA

View raw message