hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6128) Automatic addition of bundled jars to distributed cache
Date Wed, 19 Nov 2014 21:19:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218513#comment-14218513
] 

Jason Lowe commented on MAPREDUCE-6128:
---------------------------------------

Thanks for updating the patch, Gera.

FILTER_TEST_CLASSES is too generic of a name.  It's a very specific config that only applies
to minicluster setups.  FILTER_TEST_CLASSES sounds like it would filter in other situations,
maybe MINICLUSTER_JOB_FILTER_TEST_CLASSES would be better?  Similarly the property name itself,
childjvm.without.test.classes, doesn't seem appropriate to when it applies.

The FILTER_TEST_CLASSES thing is so specific -- could be called DO_TESTMRJOBS_HACK ;-) --
that I'm wondering if it would be better to localize it to the one place it's needed rather
than "advertise" this as a config.  TestMRJobs could setup job confs so job classpaths are
set to the filtered classpath.  It could make sure jobs get confs that don't have yarn.is.minicluster
set and have the env properties set CLASSPATH=<filtered classpath>.  Not a must-fix
but something to consider.

For the job submitter changes, I'm not sure why we need to be specific about ".jar".  As I
see it, the manifest is going to add some files to the distributed cache, and we want to avoid
collisions with other things being added to that cache.  Is it important that we check only
for .jar files?  If the manifest asks for a non-jar file, what's the reasoning why we wouldn't
want that localized as well?

If the manifest asks for two different jars with the same basename then I think it will silently
skip the latter entry.  Intentional?

Theoretically distributed cache archives could also conflict with the manifest, so I'm thinking
the manifest should be processed after archives and the conflict check should also check the
archive list.

We may want an info (debug?) log message when manifest entries are overridden by other distributed
cache entries.


> Automatic addition of bundled jars to distributed cache 
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-6128
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6128
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 2.5.1
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: MAPREDUCE-6128.v01.patch, MAPREDUCE-6128.v02.patch, MAPREDUCE-6128.v03.patch,
MAPREDUCE-6128.v04.patch, MAPREDUCE-6128.v05.patch, MAPREDUCE-6128.v06.patch, MAPREDUCE-6128.v07.patch
>
>
> On the client side, JDK adds Class-Path elements from the job jar manifest
> on the classpath. In theory there could be many bundled jars in many directories such
that adding them manually via libjars or similar means to task classpaths is cumbersome. If
this property is enabled, the same jars are added
> to the task classpaths automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message