hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Preston Koprivica (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6344) Inconsistent classpath/classloading from DistributedCache archives
Date Tue, 28 Apr 2015 18:52:08 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Preston Koprivica updated MAPREDUCE-6344:
-----------------------------------------
    Description: 
We recently upgraded to MRv2 on YARN and have been noticing very inconsistent classloading
between the job submission client and the tasks as they start up. 

I've tracked the issue to this method:

https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L264

It appears that the classpath is simply "wild carded".  According the javase 7&8 docs,
the order of enumeration is not specified and may differ from moment to moment [1][2].  This
is a problem for applications that rely on strict ordering, which the MRv1 DistributedCache
used to provide.

I'm unable to track down all the things that are linked or landed into the $PWD of the container,
but assuming we can't account for all these things, a simple solution could be to explicitly
enumerate the files in DistributedCache - similar to the "non jar" case [3] - and then add
the "*" for passivity.  

[1] http://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.htm
[2] http://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762
[3] https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L270


  was:
We recently upgraded to MRv2 on YARN and have been noticing very inconsistent classloading
between the job submission client and the tasks as they start up. 

I've tracked the issue to this method:

https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L264

It appears that the classpath is simply "wild carded".  According the javase 7&8 docs,
the order of enumeration is not specified and may differ from moment to moment [1][2].  This
is a problem for applications that rely on strict ordering, which the MRv1 DistributedCache
used to honor.

I'm unable to track down all the things that are linked or landed into the \$PWD of the container,
but assuming we can't account for all these things, a simple solution could be to explicitly
enumerate the files in DistributedCache - similar to the "non jar" case [3] - and then add
the "*" for passivity.  

[1] http://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.htm
[2] http://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762
[3] https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L270



> Inconsistent classpath/classloading from DistributedCache archives
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6344
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6344
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.5.2
>            Reporter: Preston Koprivica
>
> We recently upgraded to MRv2 on YARN and have been noticing very inconsistent classloading
between the job submission client and the tasks as they start up. 
> I've tracked the issue to this method:
> https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L264
> It appears that the classpath is simply "wild carded".  According the javase 7&8
docs, the order of enumeration is not specified and may differ from moment to moment [1][2].
 This is a problem for applications that rely on strict ordering, which the MRv1 DistributedCache
used to provide.
> I'm unable to track down all the things that are linked or landed into the $PWD of the
container, but assuming we can't account for all these things, a simple solution could be
to explicitly enumerate the files in DistributedCache - similar to the "non jar" case [3]
- and then add the "*" for passivity.  
> [1] http://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.htm
> [2] http://docs.oracle.com/javase/8/docs/technotes/tools/windows/classpath.html#A1100762
> [3] https://github.com/apache/hadoop/blob/release-2.5.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java#L270



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message