hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: How to write a custom input format and record reader to read multiple lines of text from files
Date Tue, 01 Dec 2009 08:23:34 GMT
Hi,
The NLineInputFormat (o.a.h.mapreduce.lib.input) achieves more or less the same, and should
help you guide writing custom input format :)

Amogh


On 12/1/09 11:47 AM, "Kunal Gupta" <kunal@techlead-india.com> wrote:

Can someone explain how to override the "FileInputFormat" and
"RecordReader" in order to be able to read multiple lines of text from
input files in a single map task?

Here the key will be the offset of the first line of text and value will
be the N lines of text.

I have overridden the class FileInputFormat:

public class MultiLineFileInputFormat
        extends FileInputFormat<LongWritable, Text>{
...
}

and implemented the abstract method:

public RecordReader createRecordReader(InputSplit split,
                TaskAttemptContext context)
         throws IOException, InterruptedException {...}

I have also overridden the recordreader class:

public class MultiLineFileRecordReader extends
RecordReader<LongWritable, Text>
{...}

and in the job configuration, specified this new InputFormat class:

job.setInputFormatClass(MultiLineFileInputFormat.class);

--------------------------------------------------------------------------
When I  run this new map/reduce program, i get the following java error:
--------------------------------------------------------------------------
Exception in thread "main" java.lang.RuntimeException:
java.lang.NoSuchMethodException: CustomRecordReader
$MultiLineFileInputFormat.<init>()
        at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
        at
org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
        at CustomRecordReader.main(CustomRecordReader.java:257)
Caused by: java.lang.NoSuchMethodException: CustomRecordReader
$MultiLineFileInputFormat.<init>()
        at java.lang.Class.getConstructor0(Class.java:2706)
        at java.lang.Class.getDeclaredConstructor(Class.java:1985)
        at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
        ... 5 more



Mime
View raw message