spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kent Yao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-20060) Support Standalone visiting secured HDFS
Date Sat, 25 Mar 2017 16:37:41 GMT

     [ https://issues.apache.org/jira/browse/SPARK-20060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kent Yao updated SPARK-20060:
-----------------------------
    Description: 
h1. Brief design

h2. Introductions
The basic issue for Standalone mode to visit kerberos secured HDFS or other kerberized Services
is how to gather the delegated tokens on the driver side and deliver them to the executor
side. 

When we run Spark on Yarn, we set the tokens to the container launch context to deliver them
automatically and for long-term running issue caused by token expiration, we have it fixed
with SPARK-14743 by writing the tokens to HDFS and updating the credential file and renewing
them over and over.  

When run Spark On Standalone, we currently have no implementations like Yarn to get and deliver
those tokens.

h2. Implementations

Firstly, we simply move the implementation of SPARK-14743 which is only for yarn to core module.
And we use it to gather the credentials we need, and also we use it to update and renew with
credential files on HDFS.

Secondly, credential files on secured HDFS are reachable for executors before they get the
tokens. Here we add a sequence configuration `spark.deploy.credential. entities` which is
used by the driver to put `token.encodeToUrlString()` before launching the executors, and
used by the executors to fetch the credential as a string sequence during fetching the driver
side spark properties, and then decode them to tokens.  Before setting up the `CoarseGrainedExecutorBackend`
we set the credentials to current executor side ugi. 



  was:For **Spark on non-Yarn** mode on a  kerberized hdfs, we don't obtain credentials from
hive metastore, hdfs, etc and just use the local kinited user to connecting them. But if we
specify the --proxy-user argument on non-yarn mode, such as local, standalone, after we simply
use `UGI.createProxyUser` to get a proxy ugi as the effective user and wrap the code in doAs,
the proxy ugi fails to talk to hive metastore cause by no crendentials. Thus, we need to obtain
credentials via the real user and add them to the proxy ugi.

    Component/s:     (was: Spark Submit)
                 Spark Core
     Issue Type: New Feature  (was: Bug)
        Summary: Support Standalone visiting secured HDFS   (was: Spark On Non-Yarn Mode with
Kerberized HDFS ProxyUser Fails Talking to Hive MetaStore )

> Support Standalone visiting secured HDFS 
> -----------------------------------------
>
>                 Key: SPARK-20060
>                 URL: https://issues.apache.org/jira/browse/SPARK-20060
>             Project: Spark
>          Issue Type: New Feature
>          Components: Deploy, Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Kent Yao
>
> h1. Brief design
> h2. Introductions
> The basic issue for Standalone mode to visit kerberos secured HDFS or other kerberized
Services is how to gather the delegated tokens on the driver side and deliver them to the
executor side. 
> When we run Spark on Yarn, we set the tokens to the container launch context to deliver
them automatically and for long-term running issue caused by token expiration, we have it
fixed with SPARK-14743 by writing the tokens to HDFS and updating the credential file and
renewing them over and over.  
> When run Spark On Standalone, we currently have no implementations like Yarn to get and
deliver those tokens.
> h2. Implementations
> Firstly, we simply move the implementation of SPARK-14743 which is only for yarn to core
module. And we use it to gather the credentials we need, and also we use it to update and
renew with credential files on HDFS.
> Secondly, credential files on secured HDFS are reachable for executors before they get
the tokens. Here we add a sequence configuration `spark.deploy.credential. entities` which
is used by the driver to put `token.encodeToUrlString()` before launching the executors, and
used by the executors to fetch the credential as a string sequence during fetching the driver
side spark properties, and then decode them to tokens.  Before setting up the `CoarseGrainedExecutorBackend`
we set the credentials to current executor side ugi. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message