hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4421) Remove dependency on deployed MR jars
Date Tue, 01 Oct 2013 16:33:23 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated MAPREDUCE-4421:
----------------------------------

    Attachment: MAPREDUCE-4421-3.patch

Thanks for taking another look, Hitesh.

bq. Regarding addMRFrameworkToDistributedCache() - one minor question: the code allows for
a non-qualified URI. Should we enforce provision of a fully-qualified path always?

I thought it would be easier to let it be qualified by the cluster's configured defaults if
not already fully qualified.  Otherwise users/admins would have to not only say "hdfs:/path/to/archive"
but "hdfs://namenode:port/path/to/archive" and if/when the name or port of the filesystem
changes then it breaks.  If we let it be qualified by cluster defaults then admins can update
the default filesystem in core-site and the simpler forms continue to work unmodified.

bq. Minor nit: I believe there should be nothing in the implementation that requires HDFS
as the storage for the MR tarball?

Good point.  I updated the documentation to refer to a distributed cache deploy rather than
an HDFS deploy.  However I did call out in the docs the performance ramifications of not using
the cluster's default filesystem and a publicly-readable path for the archive.  Otherwise
the job submitter could end up re-uploading and the nodes re-localizing the framework for
each job or each user.  It will work, but it will be slower than necessary.

> Remove dependency on deployed MR jars
> -------------------------------------
>
>                 Key: MAPREDUCE-4421
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Jason Lowe
>         Attachments: MAPREDUCE-4421-2.patch, MAPREDUCE-4421-3.patch, MAPREDUCE-4421.patch,
MAPREDUCE-4421.patch
>
>
> Currently MR AM depends on MR jars being deployed on all nodes via implicit dependency
on YARN_APPLICATION_CLASSPATH. 
> We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, probably, just
rely on adding a shaded MR jar along with job.jar to the dist-cache.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message