hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6565) Configuration to use host name in delegation token service is not read from job.xml during MapReduce job execution.
Date Fri, 18 Nov 2016 21:39:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15677831#comment-15677831
] 

Jason Lowe commented on MAPREDUCE-6565:
---------------------------------------

I'm +1 for making all client-side settings override anything in any site file that isn't marked
final for 3.x.  I'm a bit hesitant for 2.x given the long-standing semantics for some of these
properties, and in any case there needs to be a clear release note explaining to users what
to expect with the change.

Hmm, there might be a problem with adding job.xml as a default resource, and that has to do
with relative ordering of when job.xml is added and other default resources like the *-site.xml
files are added.  The various site.xml files are only added as defaults when the related classes
are loaded (e.g.: HdfsConfiguration, YarnConfiguration, JobConf, etc.)  If we add job.xml
as a default resource _before_ some of these classes are touched then some site files will
override the job.xml files because they'll be loaded later.  We can probably get the ordering
right for all the site files provided by core Hadoop, but I'm worried about downstream projects
that may have their own site files (e.g.: hive-site.xml).  Client-side settings could be smashed
by site settings if the ordering is not correct.  job.xml would need to be the last default
resource added, and we may not be able to guarantee that with arbitrary downstream code.

Unfortunately without using the default resource feature of Configuration, I don't know of
a straightforward way to get classes using plain ol' Configuration instances to see values
set in job.xml.  Any ideas here?  We can fix individual instances like hadoop.security.token.service.use_ip
in case-specific ways (i.e.: calling the SecurityUtil.setTokenServiceUseIp method for this
property), but not all cases will have a straightforward fix.  And we'd have to track them
all down individually.

> Configuration to use host name in delegation token service is not read from job.xml during
MapReduce job execution.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6565
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6565
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Chris Nauroth
>            Assignee: Li Lu
>
> By default, the service field of a delegation token is populated based on server IP address.
 Setting {{hadoop.security.token.service.use_ip}} to {{false}} changes this behavior to use
host name instead of IP address.  However, this configuration property is not read from job.xml.
 Instead, it's read from a separate {{Configuration}} instance created during static initialization
of {{SecurityUtil}}.  This does not work correctly with MapReduce jobs if the framework is
distributed by setting {{mapreduce.application.framework.path}} and the {{mapreduce.application.classpath}}
is isolated to avoid reading core-site.xml from the cluster nodes.  MapReduce tasks will fail
to authenticate to HDFS, because they'll try to find a delegation token based on the NameNode
IP address, even though at job submission time the tokens were generated using the host name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message