hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reik Schatz <reik.sch...@bwin.org>
Subject Re: using StreamInputFormat, StreamXmlRecordReader with your custom Jobs
Date Thu, 11 Mar 2010 08:14:50 GMT
Uh, do I have to copy the jar file manually into HDFS before I invoke 
the hadoop jar command starting my own job?

Utkarsh Agarwal wrote:

> I think you can use DistributedCache to specify the location of the jar
> after you have it in hdfs..
> On Wed, Mar 10, 2010 at 6:11 AM, Reik Schatz <reik.schatz@bwin.org> wrote:
>> Hi, I am playing around with version 0.20.2 of Hadoop. I have written and
>> packaged a Job using a custom Mapper and Reducer. The input format in my Job
>> is set to StreamInputFormat. Also setting property stream.recordreader.class
>> to org.apache.hadoop.streaming.StreamXmlRecordReader.
>> This is how I want to start my job:
>> hadoop jar custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output
>> The problem is that in this case all classes from
>> hadoop-0.20.2-streaming.jar are missing (ClassNotFoundException). I tried
>> using -libjars without luck.
>> hadoop jar -libjars PATH/hadoop-0.20.2-streaming.jar
>> custom-1.0-SNAPSHOT.jar EmailCountingJob /input /output
>> Any chance to use streaming classes with your own Jobs without copying
>> these classes to your projects and packaging them into your own jar?
>> /Reik

View raw message