hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Gummelt (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAPREDUCE-6876) FileInputFormat.listStatus should not fetch delegation tokens
Date Fri, 14 Apr 2017 18:48:41 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969374#comment-15969374
] 

Michael Gummelt edited comment on MAPREDUCE-6876 at 4/14/17 6:48 PM:
---------------------------------------------------------------------

bq. The input format must obtain the necessary tokens for the tasks to be able to access the
input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just fetching split information.  It doesn't create tasks.
 So it shouldn't need to fetch delegation tokens.  That should be the responsibility of the
job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to fetch split
information, such as we do in Spark, wouldn't need to fetch delegation tokens.

I'm not saying that delegation tokens aren't eventually needed for MapReduce jobs, it's just
that this seems like the wrong place to fetch them.


was (Author: mgummelt):
bq. The input format must obtain the necessary tokens for the tasks to be able to access the
input splits, and this is how FileInputFormat accomplishes that.

But the {{FileInputFormat}} is just fetching split information.  It doesn't create tasks.
 So it shouldn't need to fetch delegation tokens.  That should be the responsibility of the
job submitting code. 

As it is, client code that is just creating a {{FileInputFormat}} in order to fetch split
information, such as we do in Spark, wouldn't need to fetch delegation tokens.

> FileInputFormat.listStatus should not fetch delegation tokens
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-6876
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6876
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Michael Gummelt
>
> {{FileInputFormat.listStatus}} fetches delegation tokens: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213
> AFAICT, this is unnecessary.  {{listStatus}} doesn't delegate those tokens to another
process.  This is causing issues described in the attached Spark Kerberos ticket, because
{{TokenCache.obtainTokensForNameNodes}}, which is used to fetch the delegation tokens, assumes
that certain MapReduce configuration variables are set, which isn't true in the Spark calling
code.  This is a separate problem, but nonetheless it wouldn't have arisen if {{listStatus}}
weren't fetching delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message