hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Armstrong <john.armstr...@ccri.com>
Subject Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2
Date Wed, 01 Jun 2011 19:38:36 GMT
On Tue, 31 May 2011 15:09:28 -0400, John Armstrong
<john.armstrong@ccri.com> wrote:
> On Tue, 31 May 2011 12:02:28 -0700, Alejandro Abdelnur
<tucu@cloudera.com>
> wrote:
>> What is exactly that does not work?

In the hopes that more information can help, I've dug into the local
filesystems on each of my four nodes and retrieved the job.xml and the
locations of the files to show that everything shows up where it should.

In this example have one regular file
(hdfs://node1:hdfsport/hdfs/path/to/file1.foo) added with
DistributedCache.addCacheFile().  I also have a JAR
(hdfs://node1:hdfsport/hdfs/path/to/needed.jar) added with
DistributedCache.addFileToClassPath().  The needed JAR is also part of the
classpath Oozie provides to my Java task.

As you can see, both files (with correct filesizes and timestamps) are
listed as cache files in job.xml, and the JAR is listed as a classpath
file.  Both files show up on each node; the JAR shows up twice on node 1
since that's where Oozie ran the Java task, and thus where Oozie placed the
JAR with its own use of the distributed cache.

And yet, when mapreduce actually tries to run the job my Java task
launches, it immediately hits a ClassNotFoundException, claiming it can't
find the class my.class.package.Needed which is contained in needed.jar.

JOB.XML
...
    <property>
        <!--Loaded from Unknown-->
        <name>mapred.job.classpath.files</name>
        <value>hdfs://node1:hdfsport/hdfs/path/to/needed.jar</value>
    </property>
...
    <property>
        <!--Loaded from Unknown-->
        <name>mapred.cache.files</name>
       
<value>hdfs://node1:hdfsport/hdfs/path/to/file1.foo,hdfs://node1:hdfsport/hdfs/path/to/needed.jar</value>
    </property>
...
    <property>
        <!--Loaded from Unknown-->
        <name>mapred.cache.files.filesizes</name>
        <value>61175,2257057</value>
    </property>
...
    <property>
        <!--Loaded from Unknown-->
        <name>mapred.cache.files.timestamps</name>
        <value>1306949104866,1306949371660</value>
    </property>
...

NODE 1 LOCAL FILESYSTEM
/data/4/mapred/local/taskTracker/distcache/5181540010607464671_-132008737_1279047490/node1/hdfs/path/to/file1.foo
/data/1/mapred/local/taskTracker/distcache/6423795395825083633_-1942178119_1279314284/node1/hdfs/path/to/needed.jar
/data/3/mapred/local/taskTracker/distcache/2424191142954514770_1281905983_1269665052/node1/hdfs/path/to/needed.jar

NODE 2 LOCAL FILESYSTEM
/data/1/mapred/local/taskTracker/distcache/-1458632814086969626_-132008737_1279047490/node1/hdfs/path/to/file1.foo
/data/2/mapred/local/taskTracker/distcache/4434671176913378591_-1942178119_1279314284/node1/hdfs/path/to/needed.jar

NODE 3 LOCAL FILESYSTEM
/data/1/mapred/local/taskTracker/distcache/-6763452370915390695_-132008737_1279047490/node1/hdfs/path/to/file1.foo
/data/2/mapred/local/taskTracker/distcache/6838381597046551111_-1942178119_1279314284/node1/hdfs/path/to/needed.jar

NODE 4 LOCAL FILESYSTEM
/data/1/mapred/local/taskTracker/distcache/-1759547009148985681_-132008737_1279047490/node1/hdfs/path/to/file1.foo
/data/2/mapred/local/taskTracker/distcache/1998811135309473771_-1942178119_1279314284/node1/hdfs/path/to/needed.jar

SAMPLE MAPPER ATTEMPT LOG

2011-06-01 14:21:41,442 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2011-06-01 14:21:41,557 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
symlink:
/data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/jars/job.jar
<-
/data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/attempt_201106011430_0002_m_000009_0/work/./job.jar
2011-06-01 14:21:41,560 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
symlink:
/data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/jars/.job.jar.crc
<-
/data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/attempt_201106011430_0002_m_000009_0/work/./.job.jar.crc
2011-06-01 14:21:41,563 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2011-06-01 14:21:41,660 WARN org.apache.hadoop.mapred.Child: Error running
child
java.lang.RuntimeException: java.lang.ClassNotFoundException:
my.class.package.Needed
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:973)
	at
org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:236)
	at org.apache.hadoop.mapred.Task.initialize(Task.java:484)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:298)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
	at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.ClassNotFoundException: my.class.package.Needed
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:247)
	at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:920)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:971)
	... 8 more


Mime
View raw message