hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
Date Wed, 18 Jan 2017 19:44:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828654#comment-15828654

Jian He commented on YARN-5910:

Hi Jason, thank you very much for the review !
bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet it clearly is
not the default value.
removed it

bq. Should this feature be tied to UserGroupInformation.isSecurityEnabled? I'm wondering if
this can cause issues where the current cluster isn't secure but the RM needs to renew the
job's tokens for a remote secure cluster or some other secure service. Seems like if this
conf is set then that's all we need to know.
Currently, the RM DelegationTokenRewener will only add the tokens if security is enabled 
(code in RMAppManager#submitApplication), so I think with this existing implemtation, we can
assume this feature is for security enabled only ?

bq. Similarly the code explicitly fails in ClientRMService if the conf is there when security
is disabled which seems like we're taking a case that isn't optimal but should work benignly
and explicitly making sure it fails. Not sure that's user friendly behavior.
My intention was to prevent user from sending conf in non-secure mode(which anyways is not
needed if my above reply is true), in case the conf size huge which may increase load on RM.
On ther other hand, Varun chatted offline that we can add a limit config in RM to limit the
size of configs, your opinion ?

bq. Nit: For the ByteBuffer usage in parseCredentials and parseTokensConf, the rewind method
calls seem unnecessary since we're throwing the buffers away immediately afterwards.
Actually, the bytebuffer is a direct reference from the containerLaunchContext, not a copy.
I think this is also required because it was specifically to solve issues in YARN-2893.

bq. Should the Configuration constructor call in parseTokensConf be using the version that
does not load defaults? If not then I recommend we at least allow a conf to be passed in to
use as a copy constructor.Loading a new Configuration from scratch is really expensive and
we should avoid it if possible. See the discussion on HADOOP-11223 for details.
Good point. I actually did the same in YarnRunner#setAppConf method, but missed this place.

bq. In DelegationTokenRenewer, why aren't we using the appConf as-is when renewing the tokens?

I wasn't sure whether the mere appConf is enough for the connection - (Is there any kerberos
related configs for RM itself are required for authentication?). Let me do some experiments,
if this works, I'll just use appConf. 
bq. Also it looks like we're polluting subsequent app-conf renewals with prior app configurations,
as well as simply leaking appConf objects as renewerConf resources infinitum. I don't see
where renewerConf gets reset in-between.
My previous patch made a copy of each appConf and merge with RM's conf(for the reason I wasn't
sure whether RM's own conf is required) and use that for renwer. But then I think this maybe
bad because every app will have its own copy of configs, which may largely increase the memory
size if the number of apps is very big. So, in the latest patch I changed it to let all apps
share the same renewerConf - this is based on the assumption that "dfs.nameservices" must
have distint keys for each distinct cluster, so we won't have situation where two apps use
different configs for the same cluster - it is true that unnecessary configs used by 1st app
will be shared by subsequent apps.  

bq. Arguably there should be a unit tests that verifies a first app with token conf key A
and a second app with token conf key B doesn't leave a situation where the renewals of the
second app are polluted with conf key A. 
If the mere appConf works, we should be fine.

Speaking of unit tests, I see where we fixed up the YARN unit tests to pass the new conf but
not a new test that verifies the specified conf is used appropriately when renewing for that
app and not for other apps that didn't specify a conf.
Yep, I'll add the UT.

> Support for multi-cluster delegation tokens
> -------------------------------------------
>                 Key: YARN-5910
>                 URL: https://issues.apache.org/jira/browse/YARN-5910
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: security
>            Reporter: Clay B.
>            Assignee: Jian He
>            Priority: Minor
>         Attachments: YARN-5910.01.patch, YARN-5910.2.patch, YARN-5910.3.patch, YARN-5910.4.patch
> As an administrator running many secure (kerberized) clusters, some which have peer clusters
managed by other teams, I am looking for a way to run jobs which may require services running
on other clusters. Particular cases where this rears itself are running something as core
as a distcp between two kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp
hdfs://LOCALCLUSTER/user/user292/test.out hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for the remote
cluster needs renewal the job will fail[1]. One can pre-configure their {{hdfs-site.xml}}
loaded by the YARN RM to know of all possible HDFSes available but that requires coordination
that is not always feasible, especially as a cluster's peers grow into the tens of clusters
or across management teams. Ideally, one could have core systems configured this way but jobs
could also specify their own handling of tokens and management when needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> ----------------
> {code}
> 2016-03-23 14:59:50,528 INFO org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
application_1458441356031_3317 found existing hdfs token Kind: HDFS_DELEGATION_TOKEN, Service:
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER,
Ident: (HDFS_DELEGATION_TOKEN token 10927 for user292)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 'hdfs://REMOTECLUSTER'
to a NameNode. Local configuration does not have a failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message