hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Example of deploying jars through DistributedCache?
Date Wed, 08 Apr 2009 21:42:25 GMT
Ooh. The other DCache-based operations assume that you're dcaching files
already resident in HDFS. I guess this assumes that the filenames are on the
local filesystem.

- Aaron

On Wed, Apr 8, 2009 at 8:32 AM, Brian MacKay <Brian.MacKay@medecision.com>wrote:

>
> I use addArchiveToClassPath, and it works for me.
>
> DistributedCache.addArchiveToClassPath(new Path(path), conf);
>
> I was curious about this block of code.  Why are you coping to tmp?
>
> >    FileSystem fs = FileSystem.get(conf);
> >    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
> > Path("tmp/aaronTest2.jar"));
>
> -----Original Message-----
> From: Tom White [mailto:tom@cloudera.com]
> Sent: Wednesday, April 08, 2009 9:36 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Example of deploying jars through DistributedCache?
>
> Does it work if you use addArchiveToClassPath()?
>
> Also, it may be more convenient to use GenericOptionsParser's -libjars
> option.
>
> Tom
>
> On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball <aaron@cloudera.com> wrote:
> > Hi all,
> >
> > I'm stumped as to how to use the distributed cache's classpath feature. I
> > have a library of Java classes I'd like to distribute to jobs and use in
> my
> > mapper; I figured the DCache's addFileToClassPath() method was the
> correct
> > means, given the example at
> >
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
> .
> >
> >
> > I've boiled it down to the following non-working example:
> >
> > in TestDriver.java:
> >
> >
> >  private void runJob() throws IOException {
> >    JobConf conf = new JobConf(getConf(), TestDriver.class);
> >
> >    // do standard job configuration.
> >    FileInputFormat.addInputPath(conf, new Path("input"));
> >    FileOutputFormat.setOutputPath(conf, new Path("output"));
> >
> >    conf.setMapperClass(TestMapper.class);
> >    conf.setNumReduceTasks(0);
> >
> >    // load aaronTest2.jar into the dcache; this contains the class
> > ValueProvider
> >    FileSystem fs = FileSystem.get(conf);
> >    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
> > Path("tmp/aaronTest2.jar"));
> >    DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"),
> > conf);
> >
> >    // run the job.
> >    JobClient.runJob(conf);
> >  }
> >
> >
> > .... and then in TestMapper:
> >
> >  public void map(LongWritable key, Text value,
> > OutputCollector<LongWritable, Text> output,
> >      Reporter reporter) throws IOException {
> >
> >    try {
> >      ValueProvider vp = (ValueProvider)
> > Class.forName("ValueProvider").newInstance();
> >      Text val = vp.getValue();
> >      output.collect(new LongWritable(1), val);
> >    } catch (ClassNotFoundException e) {
> >      throw new IOException("not found: " + e.toString()); //
> newInstance()
> > throws to here.
> >    } catch (Exception e) {
> >      throw new IOException("Exception:" + e.toString());
> >    }
> >  }
> >
> >
> > The class "ValueProvider" is to be loaded from aaronTest2.jar. I can
> verify
> > that this code works if I put ValueProvider into the main jar I deploy. I
> > can verify that aaronTest2.jar makes it into the
> > ${mapred.local.dir}/taskTracker/archive/
> >
> > But when run with ValueProvider in aaronTest2.jar, the job fails with:
> >
> > $ bin/hadoop jar aaronTest1.jar TestDriver
> > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 10
> > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 10
> > 09/03/01 22:36:04 INFO mapred.JobClient: Running job:
> job_200903012210_0005
> > 09/03/01 22:36:05 INFO mapred.JobClient:  map 0% reduce 0%
> > 09/03/01 22:36:14 INFO mapred.JobClient: Task Id :
> > attempt_200903012210_0005_m_000000_0, Status : FAILED
> > java.io.IOException: not found: java.lang.ClassNotFoundException:
> > ValueProvider
> >    at TestMapper.map(Unknown Source)
> >    at TestMapper.map(Unknown Source)
> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> >    at
> > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> >
> >
> > Do I need to do something else (maybe in Mapper.configure()?) to actually
> > classload the jar? The documentation makes me believe it should already
> be
> > in the classpath by doing only what I've done above. I'm on Hadoop
> 0.18.3.
> >
> > Thanks,
> > - Aaron
> >
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this message in error, please contact the sender and delete the material
> from any computer.
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message