hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shrijeet Paliwal <shrij...@rocketfuel.com>
Subject Re: Specifying the InputFormat class that exists in a JAR on the hdfs
Date Wed, 13 Oct 2010 23:47:33 GMT
Also you dont necessarily need to use DistributedCache API from your
application. You can supply  libjars flag from command line to supply
additional jars to mappers and reducers.

Take a look :
http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Usage  (look
for libjars option)

On Wed, Oct 13, 2010 at 4:41 PM, Shrijeet Paliwal
<shrijeet@rocketfuel.com>wrote:

> Do that only on the machine which is launching the job.
>
>
> On Wed, Oct 13, 2010 at 4:38 PM, Michael Moores <mmoores@real.com> wrote:
>
>> Add it to HADOOP_CLASSPATH on all machines running the task?
>> I can try that, but I'd like users to be able to execute jobs using jars
>> from their own hdfs directory.
>>
>>
>> On Oct 13, 2010, at 4:21 PM, Shrijeet Paliwal wrote:
>>
>> > How about adding it to HADOOP_CLASSPATH if not already.
>> >
>> > On Wed, Oct 13, 2010 at 4:15 PM, Michael Moores <mmoores@real.com>
>> wrote:
>> >
>> >> fyi- I also tried thr archive version--
>> >>
>> >> calling DistributedCache.addArchiveToClassPath(path, configuration);
>> >>
>> >> On Oct 13, 2010, at 4:12 PM, Michael Moores wrote:
>> >>
>> >>> I have specified my InputFormat to be the cassandra
>> >> ColumnFamilyInputFormat, and also
>> >>> added the cassandra JAR to my classpath via a call to
>> >> DistributedCache.addFileToClassPath().
>> >>> The JAR exists on the HDFS.
>> >>> When I run my jar I get  java.lang.NoClassDefFoundError:
>> >> org/apache/cassandra/hadoop/ColumnFamilyInputFormat at the line that
>> >>> makes the job.setInputFormatClass() call.
>> >>>
>> >>> I execute the job with "hadoop jar <myjar>".
>> >>>
>> >>> Will I need to put my cassandra JAR on each machine and add it to the
>> JVM
>> >> startup options???
>> >>>
>> >>> Here is a code snippet:
>> >>>
>> >>> public class MyStats extends Configured implements Tool {
>> >>> ...
>> >>>  public static void main(String[] args) throws Exception {
>> >>>       // Let ToolRunner handle generic command-line options
>> >>>       Configuration configuration = new Configuration();
>> >>>       Path path = new
>> >> Path("/user/hadoop/profilestats/cassandra-0.7.0-beta2.jar");
>> >>>       log.info("main: adding jars...");
>> >>>       DistributedCache.addFileToClassPath(path, configuration);
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>       ToolRunner.run(configuration, new MyStats(), args);
>> >>>       System.exit(0);
>> >>>   }
>> >>>
>> >>>  public int run(String[] args) throws Exception {
>> >>>     Job job = new Job(getConf(), "myjob");
>> >>>
>> >>
>> job.setInputFormatClass(org.apache.cassandra.hadoop.ColumnFamilyInputFormat.class);
>> >>>     ..
>> >>>     job.waitForCompletion(true);
>> >>>  }
>> >>>
>> >>>
>> >>> FILE LISTING from HDFS:
>> >>>
>> >>> [hadoop@kv-app02 ~]$ hadoop dfs -lsr
>> >>> 10/10/13 14:57:47 INFO security.Groups: Group mapping
>> >> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
>> >> cacheTimeout=300000
>> >>> 10/10/13 14:57:48 WARN conf.Configuration: mapred.task.id is
>> deprecated.
>> >> Instead, use mapreduce.task.attempt.id
>> >>> drwxr-xr-x   - hadoop supergroup          0 2010-10-13 14:34
>> >> /user/hadoop/profilestats
>> >>> -rw-r--r--   3 hadoop supergroup    1841467 2010-10-13 14:34
>> >> /user/hadoop/profilestats/cassandra-0.7.0-beta2.jar
>> >>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message