Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 18 Jan 2017 19:44:26 +0000 (UTC)
From: "Jian He (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13021620.1479424497000.44133.1484768666641@Atlassian.JIRA>
In-Reply-To: <JIRA.13021620.1479424497000@Atlassian.JIRA>
References: <JIRA.13021620.1479424497000@Atlassian.JIRA> <JIRA.13021620.1479424497627@jira-lw-us.apache.org>
Subject: [jira] [Commented] (YARN-5910) Support for multi-cluster delegation
 tokens
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 18 Jan 2017 19:44:32 -0000


    [ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828654#comment-15828654 ] 

Jian He commented on YARN-5910:
-------------------------------

Hi Jason, thank you very much for the review !
bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet it clearly is not the default value.
removed it

bq. Should this feature be tied to UserGroupInformation.isSecurityEnabled? I'm wondering if this can cause issues where the current cluster isn't secure but the RM needs to renew the job's tokens for a remote secure cluster or some other secure service. Seems like if this conf is set then that's all we need to know.
Currently, the RM DelegationTokenRewener will only add the tokens if security is enabled  (code in RMAppManager#submitApplication), so I think with this existing implemtation, we can assume this feature is for security enabled only ?

bq. Similarly the code explicitly fails in ClientRMService if the conf is there when security is disabled which seems like we're taking a case that isn't optimal but should work benignly and explicitly making sure it fails. Not sure that's user friendly behavior.
My intention was to prevent user from sending conf in non-secure mode(which anyways is not needed if my above reply is true), in case the conf size huge which may increase load on RM. On ther other hand, Varun chatted offline that we can add a limit config in RM to limit the size of configs, your opinion ?

bq. Nit: For the ByteBuffer usage in parseCredentials and parseTokensConf, the rewind method calls seem unnecessary since we're throwing the buffers away immediately afterwards.
Actually, the bytebuffer is a direct reference from the containerLaunchContext, not a copy. I think this is also required because it was specifically to solve issues in YARN-2893.

bq. Should the Configuration constructor call in parseTokensConf be using the version that does not load defaults? If not then I recommend we at least allow a conf to be passed in to use as a copy constructor.Loading a new Configuration from scratch is really expensive and we should avoid it if possible. See the discussion on HADOOP-11223 for details.
Good point. I actually did the same in YarnRunner#setAppConf method, but missed this place.

bq. In DelegationTokenRenewer, why aren't we using the appConf as-is when renewing the tokens? 
I wasn't sure whether the mere appConf is enough for the connection - (Is there any kerberos related configs for RM itself are required for authentication?). Let me do some experiments, if this works, I'll just use appConf. 
bq. Also it looks like we're polluting subsequent app-conf renewals with prior app configurations, as well as simply leaking appConf objects as renewerConf resources infinitum. I don't see where renewerConf gets reset in-between.
My previous patch made a copy of each appConf and merge with RM's conf(for the reason I wasn't sure whether RM's own conf is required) and use that for renwer. But then I think this maybe bad because every app will have its own copy of configs, which may largely increase the memory size if the number of apps is very big. So, in the latest patch I changed it to let all apps share the same renewerConf - this is based on the assumption that "dfs.nameservices" must have distint keys for each distinct cluster, so we won't have situation where two apps use different configs for the same cluster - it is true that unnecessary configs used by 1st app will be shared by subsequent apps.  

bq. Arguably there should be a unit tests that verifies a first app with token conf key A and a second app with token conf key B doesn't leave a situation where the renewals of the second app are polluted with conf key A. 
If the mere appConf works, we should be fine.

Speaking of unit tests, I see where we fixed up the YARN unit tests to pass the new conf but not a new test that verifies the specified conf is used appropriately when renewing for that app and not for other apps that didn't specify a conf.
Yep, I'll add the UT.

> Support for multi-cluster delegation tokens
> -------------------------------------------
>
>                 Key: YARN-5910
>                 URL: https://issues.apache.org/jira/browse/YARN-5910
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: security
>            Reporter: Clay B.
>            Assignee: Jian He
>            Priority: Minor
>         Attachments: YARN-5910.01.patch, YARN-5910.2.patch, YARN-5910.3.patch, YARN-5910.4.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which have peer clusters managed by other teams, I am looking for a way to run jobs which may require services running on other clusters. Particular cases where this rears itself are running something as core as a distcp between two kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp hdfs://LOCALCLUSTER/user/user292/test.out hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for the remote cluster needs renewal the job will fail[1]. One can pre-configure their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes available but that requires coordination that is not always feasible, especially as a cluster's peers grow into the tens of clusters or across management teams. Ideally, one could have core systems configured this way but jobs could also specify their own handling of tokens and management when needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> ----------------
> {code}
> 2016-03-23 14:59:50,528 INFO org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: application_1458441356031_3317 found existing hdfs token Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for user292)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org