hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
Date Thu, 13 Sep 2012 21:11:09 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455292#comment-13455292
] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4421:
----------------------------------------------------

bq. This means the cluster setup should not have AMs site.xml files deployed in it.
bq. Then the 'yarn' client script should support a '-ampath=' (HDFS path to AM resources)
option and/or a '-amname=' (logical name of the AM resources, a new config file am-site.xml
in the cluster would have this mapping for blessed AMs, as suggested in my prev comment).

That is possible today itself with the separate YARN_CONF_DIR. I haven't tested with separate
conf-dirs but can check right away. Essentially, MR has its conf file (mapred-default.xml),
dist-shell could have its own. We can argue either ways about creating a new config file am-site.xml
for 'blessed' AMs. 

bq. This means that the cluster setup should not have AMs JARs deployed in it.
This is already the case. I have the test-cluster with HADOOP_YARN_HOME ahd HADOOP_MAPRED_HOME
separate. So yeah, YARN doesn't have any mapred jars (except the shuffle related ones, which
is not for the AMs)

bq. For one off custom AMs, JARs and config would be all provided by the client on submission.
bq. For AMs like MapReduce, DistributedShell and widely used AMs in a given cluster, their
JARs and config site.xml files would be in HDFS.
Yarn doesn't care how the AM related jars are managed. All it needs to know at the end of
the day is a FS location of all the jars needed by the app. So the jars can be managed in
two ways. The framework specific clients can pick up the AM jars
 - from a public location on DFS and populate dist-cache
 - or a local installation on the client and upload it to a private location on DFS and populate
the dist-cache
 - or if the AM jars happen to be installed on every node, construct the classpath referring
to those jars.

Today MR AM implements the third option above, this JIRA is to enable the first two options.

                
> Remove dependency on deployed MR jars
> -------------------------------------
>
>                 Key: MAPREDUCE-4421
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Vinod Kumar Vavilapalli
>
> Currently MR AM depends on MR jars being deployed on all nodes via implicit dependency
on YARN_APPLICATION_CLASSPATH. 
> We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, probably, just
rely on adding a shaded MR jar along with job.jar to the dist-cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message