hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Willis <swil...@compete.com>
Subject Understanding mapreduce.admin.user.env
Date Mon, 03 Nov 2014 22:14:20 GMT
I want to make sure that the native libraries installed on the nodemanagers get used by all
yarn containers. I first found the mapreduce.admin.{map,reduce}.child.java.opts config property
and set it to:

    '-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Djava.library.path=/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native'

Basically adding on the native paths to the default values for these properties. This seemed
to work, but now I see the warning:

    WARN mapred.YARNRunner: Usage of -Djava.library.path in mapreduce.admin.map.child.java.opts
can cause programs to no longer function if hadoop native libraries are used. These values
should be set as part of the LD_LIBRARY_PATH in the map JVM env using mapreduce.admin.user.env
config settings.

Okay, so I can go and set mapreduce.admin.user.env, but before I do that I have a few questions.
Where are these properties actually read in and set? Are they read and set prior to the job
being submitted by the client code, on the host where "hadoop jar whatever.jar" is run? Or
are they set by the Resource Manager. Or the Application master? Or is it read on the host
the map or reduce task actually runs on?

Imagine the following scenarios:

 A. The mapreduce.admin.user.env property is not set explicitly by the job's java code prior
to submission. It is not set via command-line switches during submit. It is not set in /etc/hadoop/conf/*-site.xml
on the client host. It is not set in /etc/hadoop/conf/*-site.xml on the host running the Resource
Manager. It is not set in /etc/hadoop/conf/*-site.xml on the host that runs the Application
Master. But it is set in /etc/hadoop/conf/mapred-site.xml on the Node Manager host that runs
one of the map tasks.
 B. Same as A, but the property is only set in /etc/hadoop/conf/mapred-site.xml on the host
that runs the Application Master (not on any of the Node Managers that run the actual tasks).
 C. Same as A. but the property is only set in /etc/hadoop/conf/mapred-site.xml on the Resource
Manager host.
 D. Same as A. but the property is only set in /etc/hadoop/conf/mapred-site.xml on the client
submission host.
 E. Same as A, but the property is set either via command line switch, or in the client's
code (assuming these cases are the same as D).

In which cases will the map task see the default value for mapreduce.admin.map.child.java.opts,
and when will it see the explicitly set value? What happens if it's explicitly set in more
than one of the locations referenced above? 

And what about mapred.child.env, where and how does that come into play?

What about yarn.app.mapreduce.am.env and yarn.app.mapreduce.am.admin.user.env, will those
settings trickle down to the actual tasks or do they only affect the Application Master's
environment? Same with yarn.nodemanager.admin-env, will it trickle down from the Node Manager
to the container? Would it be better to set one of these rather than the mapreduce equivalent
so that I get the native libraries for all yarn apps, not just mapreduce ones?

-Steven Willis
Mime
View raw message