hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5663) Add an interface to Input/Ouput Formats to obtain delegation tokens
Date Wed, 15 Jan 2014 20:18:22 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872542#comment-13872542

Alejandro Abdelnur commented on MAPREDUCE-5663:

bq. ... I’m not too sure about - mainly from the perspective of services not handling getToken
requests correctly if security is disabled

We are moving away from this, in Yarn we always use tokens, regardless of the security configuration.
Oozie needs tokens to be there in order to work correctly.

bq. ... The JobClient currently doesn't do this, at least for HDFS.

Actually, yes it does do this if you set the {{MRJobConfig.JOB_NAMENODES}} property, this
is done in the {{JobSubmitter#populateTokenCache()}} method which is called by {{JobSubmitter#submitJobInternal()}}
which is called by {{JobSubmitter#submit()}}. All this is done in the main execution path,
thus always done when doing a submit. It is independent of split computations.

bq. ... For HBase / HCatalog sources which are outside of the IF/OF for a MR job - I don't
think we have the capability for fetching tokens, and rely on the user providing them up front.

Actually, we are fetching them upfront only because this was needed for MR jobs, but MR shouldn’t
be a special case. Oozie has the concept of {{CredentialsProvider}} for this very same reason.
And I think with this JIRA we can fix this in a general case.

bq. ... Would this utility class know how to handle all kinds of URIs ?

Yes, based on registered handlers for different schemes, more on this follows.

My thinking on how to address this is to use the same pattern we are doing today for loading/registering
{{FileSystem}}, {{CompressionCodec}}, {{TokenRenewers}}, {{SecurityInfo}} implementations.
Using JDK’s {{ServiceLoader}} mechanism to load all available implementations of the following

 * Implementations must be thread-safe.
public interface CredentialsProvider {

  * Reports the scheme being supported by this provider.
 public String getScheme();

  * Obtains delegations tokens for the provided URIs.
  * @param conf configuration used to initialize the components that connect to the specified
  * @param uris URIs of services to obtain delegation tokens from.
  * @ param targetCredentials credentials to add the fetched delegation tokens.
 public void obtainCredentials(Configuration conf, URI[] uris, Credentials targetCredentials)
throws IOException;

Then we would have a {{CredentialsProvider}} class that would use a {{ServiceLoader}} to load
all credentials available in the classpatch (via the ServiceLoader mechanism, the nice thing
about this is that you drop a JAR file with a service implementation and you don’t have
to configure anything, it just works provided you have the META-INF/services/... file for
it). This would be done in a class static block initialization.

the {{CredentialsProvider}} would have a static method {{fetchCredentials(Configuration, URI[],
Credentials)}} which sorts out the URIs by scheme and then invokes the corresponding {{CredentialsProvider}}
impl for it.

Then the different Yarn applications define a property in the conf to indicate the URIs of
the services to get tokens and their client submission code does it (like the {{JobSubmitter}}
does with {{MRJobConfig.JOB_NAMENODES}} but in a general way. Frameworks may chose to be smarter
(in the case of MR get the URIS from the splits an the output dir and get the tokens automatically).

> Add an interface to Input/Ouput Formats to obtain delegation tokens
> -------------------------------------------------------------------
>                 Key: MAPREDUCE-5663
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5663
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Michael Weng
>         Attachments: MAPREDUCE-5663.4.txt, MAPREDUCE-5663.5.txt, MAPREDUCE-5663.6.txt,
MAPREDUCE-5663.patch.txt, MAPREDUCE-5663.patch.txt2, MAPREDUCE-5663.patch.txt3
> Currently, delegation tokens are obtained as part of the getSplits / checkOutputSpecs
calls to the InputFormat / OutputFormat respectively.
> This works as long as the splits are generated on a node with kerberos credentials. For
split generation elsewhere (AM for example), an explicit interface is required.

This message was sent by Atlassian JIRA

View raw message