hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token
Date Tue, 28 Oct 2014 09:29:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186642#comment-14186642

Steve Loughran commented on HDFS-7295:

bq. What Steve Loughran said.

I don't know whether to be pleased or scared by the fact you are agreeing with me. Maybe both.


bq. My concern is the damage with a stolen keytab is far greater than the HDFS token. Its
universal kerberos identity versus something that works only with HDFS.

In a more complex application you end up needing to authenticate IPC/REST between different
 services anyway. Example: pool of tomcat instances talking to HBase in YARN running against
HDFS. Keytabs avoid having different solutions for different parts of the stack. For the example
cited, I'd just have one single "app" account for the HBase and tomcat instances; {{sudo}}
launch them all as that user.

bq. Ops team might consider a longer delegation token to be lower risk than having a more
valuable asset - users's keytab - be exposed on a wide surface area (we need all nodes to
have access to the keytabs)

push it out during localization; rely on the NM to set up the paths securely and to clean
up afterwards. The weaknesses become
# packet sniffing. Better encrypt your wires.
# NM process fails, container then terminates: no cleanup
# malicious processes able to gain root access to the system. But do that and you get enough
other things away...

bq. Using keytabs for headless accounts will work for services that do not use the user account.
Spark streaming, for example, runs as the user just like Map Reduce. This would mean asking
user to create and deploy keytabs for those scenarios, correct?

Depends on the duration of the instance. Short-lived: no. Medium lived: no. Long-lived, you
need a keytab —but it does not have to be that of the user submitting the job, merely one
with access to the (persistent) data.

bq. perhaps we can add a whitelist/blacklist for who can set arbitrary lifetime on their DT,
and whether there is a cap to the lifetime.

This adding even more complexity to a security system that is already hard for some people
(myself, for example) to understand.

bq. It's straightforward to build a revocation mechanism, along with some stats reporting
on DT usages, plus auditing.

Yes —but does it scale? Is every request going to have to trigger a token revocation check,
or simply a fraction? Even with that fraction, what load ends up being placed on the infrastructure
-including potentially the enterprise wide Kerberos/AD systems. We also need to think about
the availability of this token revocation check infrastructure, whether to hide in the NN
and add more overhead there (as well as more data to keep in sync), or deploy and manage some
other token revocation infrastructure. I am not, personally, enthused by the idea.

I don't think anyone pretends that keytabs are an ideal solution, I know some cluster ops
teams will be unhappy about this, but also think that saying "near-indefinite kerberos tokens"
isn't going to make those people happy either. 

There's another option which we looked at for slider: pushing out new tokens from the client,
just as the RM does token renewal today. you've got to remember to refresh them regularly,
and be able to get those tokens to the processes in the YARN containers, processes that may
then want to switch over to them. I could imagine this though, with Oozie jobs scheduled to
do the renewal, and something in YARN to help with token propagation. 

> Support arbitrary max expiration times for delegation token
> -----------------------------------------------------------
>                 Key: HDFS-7295
>                 URL: https://issues.apache.org/jira/browse/HDFS-7295
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Anubhav Dhoot
>            Assignee: Anubhav Dhoot
> Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. This is
a problem for different users of HDFS such as long running YARN apps. Users should be allowed
to optionally specify max lifetime for their tokens.

This message was sent by Atlassian JIRA

View raw message