Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 80395 invoked from network); 8 Apr 2009 21:43:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2009 21:43:12 -0000 Received: (qmail 81782 invoked by uid 500); 8 Apr 2009 21:43:10 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 81699 invoked by uid 500); 8 Apr 2009 21:43:10 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 81689 invoked by uid 99); 8 Apr 2009 21:43:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2009 21:43:10 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.46.28] (HELO yw-out-2324.google.com) (74.125.46.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2009 21:43:02 +0000 Received: by yw-out-2324.google.com with SMTP id 2so211236ywt.29 for ; Wed, 08 Apr 2009 14:42:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.140.15 with SMTP id n15mr1013492and.105.1239226960516; Wed, 08 Apr 2009 14:42:40 -0700 (PDT) In-Reply-To: <984A276028CB484E9082A2DE25282DDB0DDCDFB6@MEDHQ-EXCHANGE.medecision.com> References: <984A276028CB484E9082A2DE25282DDB0DDCDFB6@MEDHQ-EXCHANGE.medecision.com> From: Aaron Kimball Date: Wed, 8 Apr 2009 14:42:25 -0700 Message-ID: Subject: Re: Example of deploying jars through DistributedCache? To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e64355dec71a910467120200 X-Virus-Checked: Checked by ClamAV on apache.org --0016e64355dec71a910467120200 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Ooh. The other DCache-based operations assume that you're dcaching files already resident in HDFS. I guess this assumes that the filenames are on the local filesystem. - Aaron On Wed, Apr 8, 2009 at 8:32 AM, Brian MacKay wrote: > > I use addArchiveToClassPath, and it works for me. > > DistributedCache.addArchiveToClassPath(new Path(path), conf); > > I was curious about this block of code. Why are you coping to tmp? > > > FileSystem fs = FileSystem.get(conf); > > fs.copyFromLocalFile(new Path("aaronTest2.jar"), new > > Path("tmp/aaronTest2.jar")); > > -----Original Message----- > From: Tom White [mailto:tom@cloudera.com] > Sent: Wednesday, April 08, 2009 9:36 AM > To: core-user@hadoop.apache.org > Subject: Re: Example of deploying jars through DistributedCache? > > Does it work if you use addArchiveToClassPath()? > > Also, it may be more convenient to use GenericOptionsParser's -libjars > option. > > Tom > > On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball wrote: > > Hi all, > > > > I'm stumped as to how to use the distributed cache's classpath feature. I > > have a library of Java classes I'd like to distribute to jobs and use in > my > > mapper; I figured the DCache's addFileToClassPath() method was the > correct > > means, given the example at > > > http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html > . > > > > > > I've boiled it down to the following non-working example: > > > > in TestDriver.java: > > > > > > private void runJob() throws IOException { > > JobConf conf = new JobConf(getConf(), TestDriver.class); > > > > // do standard job configuration. > > FileInputFormat.addInputPath(conf, new Path("input")); > > FileOutputFormat.setOutputPath(conf, new Path("output")); > > > > conf.setMapperClass(TestMapper.class); > > conf.setNumReduceTasks(0); > > > > // load aaronTest2.jar into the dcache; this contains the class > > ValueProvider > > FileSystem fs = FileSystem.get(conf); > > fs.copyFromLocalFile(new Path("aaronTest2.jar"), new > > Path("tmp/aaronTest2.jar")); > > DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"), > > conf); > > > > // run the job. > > JobClient.runJob(conf); > > } > > > > > > .... and then in TestMapper: > > > > public void map(LongWritable key, Text value, > > OutputCollector output, > > Reporter reporter) throws IOException { > > > > try { > > ValueProvider vp = (ValueProvider) > > Class.forName("ValueProvider").newInstance(); > > Text val = vp.getValue(); > > output.collect(new LongWritable(1), val); > > } catch (ClassNotFoundException e) { > > throw new IOException("not found: " + e.toString()); // > newInstance() > > throws to here. > > } catch (Exception e) { > > throw new IOException("Exception:" + e.toString()); > > } > > } > > > > > > The class "ValueProvider" is to be loaded from aaronTest2.jar. I can > verify > > that this code works if I put ValueProvider into the main jar I deploy. I > > can verify that aaronTest2.jar makes it into the > > ${mapred.local.dir}/taskTracker/archive/ > > > > But when run with ValueProvider in aaronTest2.jar, the job fails with: > > > > $ bin/hadoop jar aaronTest1.jar TestDriver > > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to > process > > : 10 > > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to > process > > : 10 > > 09/03/01 22:36:04 INFO mapred.JobClient: Running job: > job_200903012210_0005 > > 09/03/01 22:36:05 INFO mapred.JobClient: map 0% reduce 0% > > 09/03/01 22:36:14 INFO mapred.JobClient: Task Id : > > attempt_200903012210_0005_m_000000_0, Status : FAILED > > java.io.IOException: not found: java.lang.ClassNotFoundException: > > ValueProvider > > at TestMapper.map(Unknown Source) > > at TestMapper.map(Unknown Source) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > > at > > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) > > > > > > Do I need to do something else (maybe in Mapper.configure()?) to actually > > classload the jar? The documentation makes me believe it should already > be > > in the classpath by doing only what I've done above. I'm on Hadoop > 0.18.3. > > > > Thanks, > > - Aaron > > > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > The information transmitted is intended only for the person or entity to > which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipient is prohibited. If you received > this message in error, please contact the sender and delete the material > from any computer. > > > --0016e64355dec71a910467120200--