hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Moores <mmoo...@real.com>
Subject Specifying the InputFormat class that exists in a JAR on the hdfs
Date Wed, 13 Oct 2010 23:12:09 GMT
I have specified my InputFormat to be the cassandra ColumnFamilyInputFormat, and also
added the cassandra JAR to my classpath via a call to DistributedCache.addFileToClassPath().
The JAR exists on the HDFS.
When I run my jar I get  java.lang.NoClassDefFoundError: org/apache/cassandra/hadoop/ColumnFamilyInputFormat
at the line that
makes the job.setInputFormatClass() call.

I execute the job with "hadoop jar <myjar>".

Will I need to put my cassandra JAR on each machine and add it to the JVM startup options???

Here is a code snippet:

public class MyStats extends Configured implements Tool {
   public static void main(String[] args) throws Exception {
        // Let ToolRunner handle generic command-line options
        Configuration configuration = new Configuration();
        Path path = new Path("/user/hadoop/profilestats/cassandra-0.7.0-beta2.jar");
        log.info("main: adding jars...");
        DistributedCache.addFileToClassPath(path, configuration);

        ToolRunner.run(configuration, new MyStats(), args);

   public int run(String[] args) throws Exception {
      Job job = new Job(getConf(), "myjob");


[hadoop@kv-app02 ~]$ hadoop dfs -lsr
10/10/13 14:57:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
10/10/13 14:57:48 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
drwxr-xr-x   - hadoop supergroup          0 2010-10-13 14:34 /user/hadoop/profilestats
-rw-r--r--   3 hadoop supergroup    1841467 2010-10-13 14:34 /user/hadoop/profilestats/cassandra-0.7.0-beta2.jar

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message