hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guttadauro, Jeff" <jeff.guttada...@here.com>
Subject yarn.application.classpath confusion...
Date Wed, 04 May 2016 20:45:55 GMT

My team is working on moving some Hadoop 1 jobs (using an old AWS EMR AMI) to YARN / Hadoop
2 (using the newer AWS EMR Release 4.x).  We have an edge node with Hadoop 2.7.2 installed
from which jobs get submitted to the cluster.  It appears that we must have the yarn.application.classpath
property set in the yarn-site.xml file on the client (edge node) in order for our jobs to
get submitted successfully.  Otherwise, the jobs fail citing the following error: "java.lang.NoClassDefFoundError:
org/apache/hadoop/mapreduce/v2/app/MRAppMaster".  This has caused a lot of confusion.

Our understanding of the precedence used for setting properties is that it will use the setting
found first from the following order of places to look: (1) Job/JobConf for the MR job, often
set programmatically, (2) *-site.xml files on client machine, (3) *-site.xml files on cluster
nodes, and finally (4) the *-default.xml files from the Hadoop installation.  So, we are confused
as to why it won't just find no setting on the client and fallback to the setting from yarn-site.xml
on the cluster nodes...?  That's how I would expect this particular property to be most commonly
used anyway, as it seems wrong and backwards that the client would be telling YARN what its
classpath should be on the cluster!  In fact, this is one of those settings that I would expect
to see commonly set to "final" on the cluster, as I think you would want to prevent a client
from providing its own value, since it doesn't make sense that a client should know where
things are installed on the cluster nodes anyway.

Perhaps I have a fundamental misunderstanding of something as we're migrating to the new YARN
framework.  A lot of what I find online seems to talk about submitting jobs from the cluster
(typically from the master node) itself, in which case it makes sense that this value should
be set.  But, when dealing with an edge node set-up like ours, I would think it should be
fine to leave that property unset.  Can you help me understand what's going on?


View raw message