spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <>
Subject [jira] [Resolved] (SPARK-14743) Improve delegation token handling in secure clusters
Date Wed, 10 Aug 2016 22:41:20 GMT


Marcelo Vanzin resolved SPARK-14743.
       Resolution: Fixed
         Assignee: Saisai Shao
    Fix Version/s: 2.1.0

> Improve delegation token handling in secure clusters
> ----------------------------------------------------
>                 Key: SPARK-14743
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 2.0.0
>            Reporter: Marcelo Vanzin
>            Assignee: Saisai Shao
>             Fix For: 2.1.0
> In a way, I'd consider this a parent bug of SPARK-7252.
> Spark's current support for delegation tokens is a little all over the place:
> - for HDFS, there's support for re-creating tokens if a principal and keytab are provided
> - for HBase and Hive, Spark will fetch delegation tokens so that apps can work in cluster
mode, but will not re-create them, so apps that need those will stop working after 7 days
> - for anything else, Spark doesn't do anything. Lots of other services use delegation
tokens, and supporting them as data sources in Spark becomes more complicated because of that.
e.g., Kafka will (hopefully) soon support them.
> It would be nice if Spark had consistent support for handling delegation tokens regardless
of who needs them. I'd list these as the requirements:
> - Spark to provide a generic interface for fetching delegation tokens. This would allow
Spark's delegation token support to be extended using some plugin architecture (e.g. Java
services), meaning Spark itself doesn't need to support every possible service out there.
> This would be used to fetch tokens when launching apps in cluster mode, and when a principal
and a keytab are provided to Spark.
> - A way to manually update delegation tokens in Spark. For example, a new SparkContext
API, or some configuration that tells Spark to monitor a file for changes and load tokens
from said file.
> This would allow external applications to manage tokens outside of Spark and be able
to update a running Spark application (think, for example, a job sever like Oozie, or something
like Hive-on-Spark which manages Spark apps running remotely).
> - A way to notify running code that new delegation tokens have been loaded.
> This may not be strictly necessary; it might be possible for code to detect that, e.g.,
by peeking into the UserGroupInformation structure. But an event sent to the listener bus
would allow applications to react when new tokens are available (e.g., the Hive backend could
re-create connections to the metastore server using the new tokens).
> Also, cc'ing [~busbey] and [~steve_l] since you've talked about this in the mailing list

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message