hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6128) Automatic addition of bundled jars to distributed cache
Date Wed, 29 Oct 2014 21:47:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189066#comment-14189066

Jason Lowe commented on MAPREDUCE-6128:

Thanks for the patch, Gera.

I think it's an interesting idea, but I'm worried about it being enabled by default.  If the
user is already manually adding jars to the classpath, and these jars have already been preloaded
into HDFS and referenced there for efficient localization across jobs, then enabling this
seems like it does proactively bad things by causing extra jars to be uploaded to HDFS and
localized or outright failing if the distributed cache names collide.  This either needs to
be disabled by default or it needs to look for duplicate jar names already in the distributed
cache (maybe both).

Also would be nice to have a unit test to verify this feature doesn't break at some point.
 For example, we could build a small jar with an MR job that has a trivial dependency on another
separate, small jar then try to submit it to a minicluster just with the job jar to verify
the automatic bundling is working.

> Automatic addition of bundled jars to distributed cache 
> --------------------------------------------------------
>                 Key: MAPREDUCE-6128
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6128
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 2.5.1
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-6128.v01.patch
> On the client side, JDK adds Class-Path elements from the job jar manifest
> on the classpath. In theory there could be many bundled jars in many directories such
that adding them manually via libjars or similar means to task classpaths is cumbersome. If
this property is enabled, the same jars are added
> to the task classpaths automatically.

This message was sent by Atlassian JIRA

View raw message