hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clay B. (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
Date Sat, 10 Dec 2016 00:12:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736764#comment-15736764

Clay B. commented on YARN-5910:

This conversation has been very educational for me; thank you! I am concerned still that if
we do not use kerberos, the requesting user will have no way to renew tokens as themselves?
If we can not authenticate as the user, won't we be unable to work when the administrators
of two clusters may be different (and thus not have the same {{yarn}} user setup -- e.g. two
different principals in kerberos). Can we find a solution to that issue here as well (or ensure
that this issue doesn't preclude that issue)?

I really like the idea that the client (human client) is responsible for specifying the resources
needed, as again in a highly federated Hadoop environment, one administration group may not
even know of all clusters and this allows for more agile cross-cluster usage.

I see there are two issues here I was hoping to solve:
1. A remote cluster's services are needed (e.g. as a data source to this job)
2. A remote cluster does not trust this cluster's YARN principal

[~jlowe] brings up some good questions and points which hit this well:
{quote}I'm not sure distributing the keytab is going to be considered a reasonable thing to
do in some setups. Part of the point of getting a token is to avoid needing to ship a keytab
everywhere. Once we have a keytab, is there a need to have a token?{quote}

If the YARN principals of each cluster are different but the user is entitled to services
on both clusters is there another way around this issue? Further, while I think many shops
may have the kerberos tooling to avoid shipping keytabs, some shops are heavily HBase (e.g.
long running query services) dependent or streaming centric (jobs last longer than maximal
token refresh periods) and thus have to use keytabs today.

{quote}There's also the problem of needing to renew the token while the AM is waiting to get
scheduled if the cluster is really busy. If the AM isn't running it can't renew the token.

I would expect the remote-cluster resources to not be central to operating the job. E.g. we
would use the local cluster for HDFS and YARN but might want to access a remote cluster's
YARN. If the AM can request tokens (i.e. with a keytab or proxy kerberos credential which
was refreshed by the RM) then we can request new tokens when the job is scheduled if it was
hung-up longer than the renewal time; further we do not worry about exploits of custom configuration
running as a privileged process but something running as a user.

Regardless, are there many clusters folks see today where the scheduling time is longer than
the renewal time of a delegation token? (I.e. that would be by-default one seventh of the
total job's maximal runtime -- longer than a day?)

{quote}My preference is to have the token be as self-descriptive as we can possibly get. Doing
the ApplicationSubmissionContext thing could work for the HA case, but I could see this being
a potentially non-trivial payload the RM has to bear for each app (configs can get quite large).
It'd rather avoid adding that to the context for this purpose if we can do so, but if the
token cannot be self-descriptive in all cases then we may not have much other choice that
I can see.{quote}

I agree this seems to be the sanest idea for how to get the configuration in; we could also
perhaps extend the various delegation token types to only optionally include this payload?
Then we the RM would only pay the price when needed for an off-cluster request?

> Support for multi-cluster delegation tokens
> -------------------------------------------
>                 Key: YARN-5910
>                 URL: https://issues.apache.org/jira/browse/YARN-5910
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: security
>            Reporter: Clay B.
>            Priority: Minor
> As an administrator running many secure (kerberized) clusters, some which have peer clusters
managed by other teams, I am looking for a way to run jobs which may require services running
on other clusters. Particular cases where this rears itself are running something as core
as a distcp between two kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp
hdfs://LOCALCLUSTER/user/user292/test.out hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for the remote
cluster needs renewal the job will fail[1]. One can pre-configure their {{hdfs-site.xml}}
loaded by the YARN RM to know of all possible HDFSes available but that requires coordination
that is not always feasible, especially as a cluster's peers grow into the tens of clusters
or across management teams. Ideally, one could have core systems configured this way but jobs
could also specify their own handling of tokens and management when needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> ----------------
> {code}
> 2016-03-23 14:59:50,528 INFO org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
application_1458441356031_3317 found existing hdfs token Kind: HDFS_DELEGATION_TOKEN, Service:
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER,
Ident: (HDFS_DELEGATION_TOKEN token 10927 for user292)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 'hdfs://REMOTECLUSTER'
to a NameNode. Local configuration does not have a failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message