hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: hadoop streaming with custom RecordReader class
Date Thu, 18 Oct 2012 04:53:08 GMT
Hi Jason,

A few questions (in order):

1. Does Hadoop's own NLineInputFormat not suffice?
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html

2. Do you make sure to pass your jar into the front-end too?

$ export HADOOP_CLASSPATH=/path/to/your/jar
$ command…

3. Does jar -tf <yourjar> carry a proper mypackage.NLineRecordReader?

4. Is your class marked public?

On Thu, Oct 18, 2012 at 9:32 AM, Jason Wang <jason.j.wang@gmail.com> wrote:
> Hi all,
> I'm experimenting with hadoop streaming on build 1.0.3.
>
> To give background info, i'm streaming a text file into mapper written in C.
> Using the default settings, streaming uses TextInputFormat which creates one
> record from each line.  The problem I am having is that I need record
> boundaries to be every 4 lines.  When the splitter breaks up the input into
> the mappers, I have partial records on the boundaries due to this.  To
> address this, my approach was to write a new RecordReader class almost in
> java that is almost identical to LineRecordReader, but with a modified
> next() method that reads 4 lines instead of one.
>
> I then compiled the new class and created a jar.  I wanted to import this at
> run time using the -libjars argument, like such:
>
> hadoop jar ../contrib/streaming/hadoop-streaming-1.0.3.jar -libjars
> NLineRecordReader.jar -files test_stream.sh -inputreader
> mypackage.NLineRecordReader -input /Users/hadoop/test/test.txt -output
> /Users/hadoop/test/output -mapper “test_stream.sh” -reducer NONE
>
> Unfortunately, I keep getting the following error:
> -inputreader: class not found: mypackage.NLineRecordReader
>
> My question is 2 fold.  Am I using the right approach to handle the 4 line
> records with the custom RecordReader implementation?  And why isn't -libjars
> working to include my class to hadoop streaming at runtime?
>
> Thanks,
> Jason



-- 
Harsh J

Mime
View raw message