hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shrijeet Paliwal <shrij...@rocketfuel.com>
Subject Re: Specifying the InputFormat class that exists in a JAR on the hdfs
Date Wed, 13 Oct 2010 23:41:20 GMT
Do that only on the machine which is launching the job.

On Wed, Oct 13, 2010 at 4:38 PM, Michael Moores <mmoores@real.com> wrote:

> Add it to HADOOP_CLASSPATH on all machines running the task?
> I can try that, but I'd like users to be able to execute jobs using jars
> from their own hdfs directory.
>
>
> On Oct 13, 2010, at 4:21 PM, Shrijeet Paliwal wrote:
>
> > How about adding it to HADOOP_CLASSPATH if not already.
> >
> > On Wed, Oct 13, 2010 at 4:15 PM, Michael Moores <mmoores@real.com>
> wrote:
> >
> >> fyi- I also tried thr archive version--
> >>
> >> calling DistributedCache.addArchiveToClassPath(path, configuration);
> >>
> >> On Oct 13, 2010, at 4:12 PM, Michael Moores wrote:
> >>
> >>> I have specified my InputFormat to be the cassandra
> >> ColumnFamilyInputFormat, and also
> >>> added the cassandra JAR to my classpath via a call to
> >> DistributedCache.addFileToClassPath().
> >>> The JAR exists on the HDFS.
> >>> When I run my jar I get  java.lang.NoClassDefFoundError:
> >> org/apache/cassandra/hadoop/ColumnFamilyInputFormat at the line that
> >>> makes the job.setInputFormatClass() call.
> >>>
> >>> I execute the job with "hadoop jar <myjar>".
> >>>
> >>> Will I need to put my cassandra JAR on each machine and add it to the
> JVM
> >> startup options???
> >>>
> >>> Here is a code snippet:
> >>>
> >>> public class MyStats extends Configured implements Tool {
> >>> ...
> >>>  public static void main(String[] args) throws Exception {
> >>>       // Let ToolRunner handle generic command-line options
> >>>       Configuration configuration = new Configuration();
> >>>       Path path = new
> >> Path("/user/hadoop/profilestats/cassandra-0.7.0-beta2.jar");
> >>>       log.info("main: adding jars...");
> >>>       DistributedCache.addFileToClassPath(path, configuration);
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>       ToolRunner.run(configuration, new MyStats(), args);
> >>>       System.exit(0);
> >>>   }
> >>>
> >>>  public int run(String[] args) throws Exception {
> >>>     Job job = new Job(getConf(), "myjob");
> >>>
> >>
> job.setInputFormatClass(org.apache.cassandra.hadoop.ColumnFamilyInputFormat.class);
> >>>     ..
> >>>     job.waitForCompletion(true);
> >>>  }
> >>>
> >>>
> >>> FILE LISTING from HDFS:
> >>>
> >>> [hadoop@kv-app02 ~]$ hadoop dfs -lsr
> >>> 10/10/13 14:57:47 INFO security.Groups: Group mapping
> >> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> >> cacheTimeout=300000
> >>> 10/10/13 14:57:48 WARN conf.Configuration: mapred.task.id is
> deprecated.
> >> Instead, use mapreduce.task.attempt.id
> >>> drwxr-xr-x   - hadoop supergroup          0 2010-10-13 14:34
> >> /user/hadoop/profilestats
> >>> -rw-r--r--   3 hadoop supergroup    1841467 2010-10-13 14:34
> >> /user/hadoop/profilestats/cassandra-0.7.0-beta2.jar
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message