hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Moores <mmoo...@real.com>
Subject Re: Specifying the InputFormat class that exists in a JAR on the hdfs
Date Wed, 13 Oct 2010 23:38:35 GMT
Add it to HADOOP_CLASSPATH on all machines running the task?
I can try that, but I'd like users to be able to execute jobs using jars from their own hdfs
directory.


On Oct 13, 2010, at 4:21 PM, Shrijeet Paliwal wrote:

> How about adding it to HADOOP_CLASSPATH if not already.
> 
> On Wed, Oct 13, 2010 at 4:15 PM, Michael Moores <mmoores@real.com> wrote:
> 
>> fyi- I also tried thr archive version--
>> 
>> calling DistributedCache.addArchiveToClassPath(path, configuration);
>> 
>> On Oct 13, 2010, at 4:12 PM, Michael Moores wrote:
>> 
>>> I have specified my InputFormat to be the cassandra
>> ColumnFamilyInputFormat, and also
>>> added the cassandra JAR to my classpath via a call to
>> DistributedCache.addFileToClassPath().
>>> The JAR exists on the HDFS.
>>> When I run my jar I get  java.lang.NoClassDefFoundError:
>> org/apache/cassandra/hadoop/ColumnFamilyInputFormat at the line that
>>> makes the job.setInputFormatClass() call.
>>> 
>>> I execute the job with "hadoop jar <myjar>".
>>> 
>>> Will I need to put my cassandra JAR on each machine and add it to the JVM
>> startup options???
>>> 
>>> Here is a code snippet:
>>> 
>>> public class MyStats extends Configured implements Tool {
>>> ...
>>>  public static void main(String[] args) throws Exception {
>>>       // Let ToolRunner handle generic command-line options
>>>       Configuration configuration = new Configuration();
>>>       Path path = new
>> Path("/user/hadoop/profilestats/cassandra-0.7.0-beta2.jar");
>>>       log.info("main: adding jars...");
>>>       DistributedCache.addFileToClassPath(path, configuration);
>>> 
>>> 
>>> 
>>> 
>>> 
>>>       ToolRunner.run(configuration, new MyStats(), args);
>>>       System.exit(0);
>>>   }
>>> 
>>>  public int run(String[] args) throws Exception {
>>>     Job job = new Job(getConf(), "myjob");
>>> 
>> job.setInputFormatClass(org.apache.cassandra.hadoop.ColumnFamilyInputFormat.class);
>>>     ..
>>>     job.waitForCompletion(true);
>>>  }
>>> 
>>> 
>>> FILE LISTING from HDFS:
>>> 
>>> [hadoop@kv-app02 ~]$ hadoop dfs -lsr
>>> 10/10/13 14:57:47 INFO security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
>> cacheTimeout=300000
>>> 10/10/13 14:57:48 WARN conf.Configuration: mapred.task.id is deprecated.
>> Instead, use mapreduce.task.attempt.id
>>> drwxr-xr-x   - hadoop supergroup          0 2010-10-13 14:34
>> /user/hadoop/profilestats
>>> -rw-r--r--   3 hadoop supergroup    1841467 2010-10-13 14:34
>> /user/hadoop/profilestats/cassandra-0.7.0-beta2.jar
>> 
>> 


Mime
View raw message