hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
Date Tue, 14 Jan 2014 15:56:52 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870835#comment-13870835

Alejandro Abdelnur commented on MAPREDUCE-5663:

The Oozie server is responsible for obtaining all the tokens the main job may need:

* tokens to run the job (working dir, jobtokens)
* tokens for the Input and Output data (typically HDFS tokens, but they can be for different
file systems, for Hbase, for HCatalog, etc).

For the typical case of running an MR job (directly or via Pig/Hive), the tokens of launcher
job are sufficient for the main job. They just need to be propagated. The Oozie server makes
sure the "mapreduce.job.complete.cancel.delegation.tokens" property is set to FALSE for the
launcher job (Oozie gets rid of the launcher job for MR jobs once the main job is running).

For scenarios where the main job needs to interact with different services, Oozie must acquire
them in advance. For HDFS this is done by simply setting the "MRJobConfig.JOB_NAMENODES" property,
then the launcher job submission will get those tokens. For Hbase or HCatalog, Oozie has a
CredentialsProvider that obtains those tokens (the requirement here is that Oozie is configured
as proxy user in those services in order to get tokens for the user submitting the job).

>From what it seems you are after generalizing this. If think we should do it with a slightly
twist from what you are proposing:

* DelegationTokens should be always requested by the client, security enabled or not, computing
the splits on the client or not.
* DelegationTokens fetching should be done regardless of the IF/OF implementation (take the
case of talking with Hbase or HCatalog, job working dir service).
* DelegationTokens fetching should not be tied to split computation.

We could have a utility class that we pass a UGI, list of service URIs and returns a populated
Credentials with tokens for all the specified services.

The IF/OF/Job would have to be able to extract the required URIs for the job.

Also, this mechanism could be used to obtain ALL tokens the AM needs.

> Add an interface to Input/Ouput Formats to obtain delegation tokens
> -------------------------------------------------------------------
>                 Key: MAPREDUCE-5663
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Michael Weng
>         Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt,
MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3
> Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs
calls to the InputFormat / OutputFormat respectively.
> This works as long as the splits are generated on a node with kerberos credentials. For
split generation elsewhere (AM for example), an explicit interface is required.

This message was sent by Atlassian JIRA

View raw message