hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Moores <mmoo...@real.com>
Subject Re: Specifying the InputFormat class that exists in a JAR on the hdfs
Date Wed, 13 Oct 2010 23:15:43 GMT
fyi- I also tried thr archive version-- 

calling DistributedCache.addArchiveToClassPath(path, configuration);

On Oct 13, 2010, at 4:12 PM, Michael Moores wrote:

> I have specified my InputFormat to be the cassandra ColumnFamilyInputFormat, and also
> added the cassandra JAR to my classpath via a call to DistributedCache.addFileToClassPath().
> The JAR exists on the HDFS.
> When I run my jar I get  java.lang.NoClassDefFoundError: org/apache/cassandra/hadoop/ColumnFamilyInputFormat
at the line that
> makes the job.setInputFormatClass() call.
> 
> I execute the job with "hadoop jar <myjar>".
> 
> Will I need to put my cassandra JAR on each machine and add it to the JVM startup options???
> 
> Here is a code snippet:
> 
> public class MyStats extends Configured implements Tool {
> ...
>   public static void main(String[] args) throws Exception {
>        // Let ToolRunner handle generic command-line options
>        Configuration configuration = new Configuration();
>        Path path = new Path("/user/hadoop/profilestats/cassandra-0.7.0-beta2.jar");
>        log.info("main: adding jars...");
>        DistributedCache.addFileToClassPath(path, configuration);
> 
> 
> 
> 
> 
>        ToolRunner.run(configuration, new MyStats(), args);
>        System.exit(0);
>    }
> 
>   public int run(String[] args) throws Exception {
>      Job job = new Job(getConf(), "myjob");
>      job.setInputFormatClass(org.apache.cassandra.hadoop.ColumnFamilyInputFormat.class);
>      ..
>      job.waitForCompletion(true);
>   }
> 
> 
> FILE LISTING from HDFS:
> 
> [hadoop@kv-app02 ~]$ hadoop dfs -lsr
> 10/10/13 14:57:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=300000
> 10/10/13 14:57:48 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use
mapreduce.task.attempt.id
> drwxr-xr-x   - hadoop supergroup          0 2010-10-13 14:34 /user/hadoop/profilestats
> -rw-r--r--   3 hadoop supergroup    1841467 2010-10-13 14:34 /user/hadoop/profilestats/cassandra-0.7.0-beta2.jar


Mime
View raw message