hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kunal Gupta (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1255) How to write a custom input format and record reader to read multiple lines of text from files
Date Tue, 01 Dec 2009 06:07:20 GMT
How to write a custom input format and record reader to read multiple lines of text from files
----------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-1255
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1255
             Project: Hadoop Map/Reduce
          Issue Type: Task
    Affects Versions: 0.20.1
         Environment: Ubuntu, 32 bit system. Apache hadoop 0.20.1
            Reporter: Kunal Gupta
            Priority: Minor


Can someone explain how to override the "FileInputFormat" and "RecordReader" in order to be
able to read multiple lines of text from input files in a single map task?

Here the key will be the offset of the first line of text and value will be the N lines of
text. 

I have overridden the class FileInputFormat:

public class MultiLineFileInputFormat
	extends FileInputFormat<LongWritable, Text>{
...
}

and implemented the abstract method:

public RecordReader createRecordReader(InputSplit split,
                TaskAttemptContext context)
         throws IOException, InterruptedException {...}

I have also overridden the recordreader class:

public class MultiLineFileRecordReader extends RecordReader<LongWritable, Text>
{...}

and in the job configuration, specified this new InputFormat class:

job.setInputFormatClass(MultiLineFileInputFormat.class);

When I  run this new map/reduce program, i get the following java error:

Exception in thread "main" java.lang.RuntimeException: java.lang.NoSuchMethodException: CustomRecordReader$MultiLineFileInputFormat.<init>()
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
	at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
	at CustomRecordReader.main(CustomRecordReader.java:257)
Caused by: java.lang.NoSuchMethodException: CustomRecordReader$MultiLineFileInputFormat.<init>()
	at java.lang.Class.getConstructor0(Class.java:2706)
	at java.lang.Class.getDeclaredConstructor(Class.java:1985)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
	... 5 more


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message