hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6344) Inconsistent classpath/classloading from DistributedCache archives
Date Tue, 28 Apr 2015 21:21:06 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518092#comment-14518092
] 

Sangjin Lee commented on MAPREDUCE-6344:
----------------------------------------

I understand this is a change of behavior from MRv1, but I'm not convinced that this is a
bug. As you noted, java makes no promise with regards to the order of enumeration. Furthermore,
if your code depends on the ordering of jars, I would definitely consider that a bug. It implies
there are multiple copies of the same class sitting in your classpath, and relying on the
ordering to get it right is futile.

> Inconsistent classpath/classloading from DistributedCache archives
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6344
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6344
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.5.2
>            Reporter: Preston Koprivica
>
> We recently upgraded to MRv2 on YARN and have been noticing very inconsistent classloading
between the job submission client and the tasks as they start up. 
> I've tracked the issue to this method:
> https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L264
> It appears that the classpath is simply "wild carded".  According the javase 7&8
docs, the order of enumeration is not specified and may differ from moment to moment [1][2].
 This is a problem for applications that rely on strict ordering, which the MRv1 DistributedCache
used to provide.
> I'm unable to track down all the things that are linked or landed into the $PWD of the
container, but assuming we can't account for all these things, a simple solution could be
to explicitly enumerate the files in DistributedCache - similar to the "non jar" case [3]
- and then add the "*" for passivity.  
> [1] http://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.htm
> [2] http://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762
> [3] https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L270



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message