hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Example of deploying jars through DistributedCache?
Date Mon, 02 Mar 2009 06:42:48 GMT
Hi all,

I'm stumped as to how to use the distributed cache's classpath feature. I
have a library of Java classes I'd like to distribute to jobs and use in my
mapper; I figured the DCache's addFileToClassPath() method was the correct
means, given the example at
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html.


I've boiled it down to the following non-working example:

in TestDriver.java:


  private void runJob() throws IOException {
    JobConf conf = new JobConf(getConf(), TestDriver.class);

    // do standard job configuration.
    FileInputFormat.addInputPath(conf, new Path("input"));
    FileOutputFormat.setOutputPath(conf, new Path("output"));

    conf.setMapperClass(TestMapper.class);
    conf.setNumReduceTasks(0);

    // load aaronTest2.jar into the dcache; this contains the class
ValueProvider
    FileSystem fs = FileSystem.get(conf);
    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
Path("tmp/aaronTest2.jar"));
    DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"),
conf);

    // run the job.
    JobClient.runJob(conf);
  }


.... and then in TestMapper:

  public void map(LongWritable key, Text value,
OutputCollector<LongWritable, Text> output,
      Reporter reporter) throws IOException {

    try {
      ValueProvider vp = (ValueProvider)
Class.forName("ValueProvider").newInstance();
      Text val = vp.getValue();
      output.collect(new LongWritable(1), val);
    } catch (ClassNotFoundException e) {
      throw new IOException("not found: " + e.toString()); // newInstance()
throws to here.
    } catch (Exception e) {
      throw new IOException("Exception:" + e.toString());
    }
  }


The class "ValueProvider" is to be loaded from aaronTest2.jar. I can verify
that this code works if I put ValueProvider into the main jar I deploy. I
can verify that aaronTest2.jar makes it into the
${mapred.local.dir}/taskTracker/archive/

But when run with ValueProvider in aaronTest2.jar, the job fails with:

$ bin/hadoop jar aaronTest1.jar TestDriver
09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
: 10
09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
: 10
09/03/01 22:36:04 INFO mapred.JobClient: Running job: job_200903012210_0005
09/03/01 22:36:05 INFO mapred.JobClient:  map 0% reduce 0%
09/03/01 22:36:14 INFO mapred.JobClient: Task Id :
attempt_200903012210_0005_m_000000_0, Status : FAILED
java.io.IOException: not found: java.lang.ClassNotFoundException:
ValueProvider
    at TestMapper.map(Unknown Source)
    at TestMapper.map(Unknown Source)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
    at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)


Do I need to do something else (maybe in Mapper.configure()?) to actually
classload the jar? The documentation makes me believe it should already be
in the classpath by doing only what I've done above. I'm on Hadoop 0.18.3.

Thanks,
- Aaron

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message