hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kunal Gupta <ku...@techlead-india.com>
Subject RE: How to write a custom input format and record reader to read multiple lines of text from files
Date Tue, 01 Dec 2009 09:57:03 GMT
I am extending the class FileInputFormat. This class is having an
abstract method createRecordReader. I have implemented the method,
but still running the program is giving me constructor errors.

I tried passing FileInputFormat as my InputFormat class in the job
configuration, and surely it gave me the initialization error:

(InstantiationExceptionConstructorAccessorImpl.java:30)

But i was hoping that after implementing the abstract method of the
class FileInputFormat this issue should not have arise.

What can i do to correctly extend the FileInputFormat class and use it
for my custom InputFormat?

On Tue, 2009-12-01 at 10:21 +0100, guillaume.viland@orange-ftgroup.com
wrote:
> I've developed a version of a MultipleLineTextInputFormat for hadoop 0.19. I think it
is not perfect but it works for my needs.
> I've attached the code, feel free to improve or use it. Do not hesitate to contact me
if you improve the code.
> 
> 
> 
> 
> -----Message d'origine-----
> De : Kunal Gupta [mailto:kunal@techlead-india.com] 
> Envoyé : mardi 1 décembre 2009 09:50
> À : mapreduce-user@hadoop.apache.org
> Objet : Re: How to write a custom input format and record reader to read multiple lines
of text from files
> 
> NLineInputFormat will help in splitting N lines of text for each Mapper,
> but it will still pass single line of text to each call to the Map
> function.
> 
> I want N lines of text to be passed as 'value' to the Map function.
> 
> By extending FileInputFormat and RecordReader classes i am concatinating
> N lines of text and setting that as the 'value'.
> 
> But this program is not running. Probably some initialization error.
> 
> I am intimating the framework to use my extended classes as InputFormat:
> 
> job.setInputFormatClass(MultiLineFileInputFormat.class);
> 
> On Tue, 2009-12-01 at 13:53 +0530, Amogh Vasekar wrote:
> > Hi,
> > The NLineInputFormat (o.a.h.mapreduce.lib.input) achieves more or less
> > the same, and should help you guide writing custom input format :)
> > 
> > Amogh
> > 
> > 
> > On 12/1/09 11:47 AM, "Kunal Gupta" <kunal@techlead-india.com> wrote:
> > 
> >         Can someone explain how to override the "FileInputFormat" and
> >         "RecordReader" in order to be able to read multiple lines of
> >         text from
> >         input files in a single map task?
> >         
> >         Here the key will be the offset of the first line of text and
> >         value will
> >         be the N lines of text.
> >         
> >         I have overridden the class FileInputFormat:
> >         
> >         public class MultiLineFileInputFormat
> >                 extends FileInputFormat<LongWritable, Text>{
> >         ...
> >         }
> >         
> >         and implemented the abstract method:
> >         
> >         public RecordReader createRecordReader(InputSplit split,
> >                         TaskAttemptContext context)
> >                  throws IOException, InterruptedException {...}
> >         
> >         I have also overridden the recordreader class:
> >         
> >         public class MultiLineFileRecordReader extends
> >         RecordReader<LongWritable, Text>
> >         {...}
> >         
> >         and in the job configuration, specified this new InputFormat
> >         class:
> >         
> >         job.setInputFormatClass(MultiLineFileInputFormat.class);
> >         
> >         --------------------------------------------------------------------------
> >         When I  run this new map/reduce program, i get the following
> >         java error:
> >         --------------------------------------------------------------------------
> >         Exception in thread "main" java.lang.RuntimeException:
> >         java.lang.NoSuchMethodException: CustomRecordReader
> >         $MultiLineFileInputFormat.<init>()
> >                 at
> >         org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
> >                 at
> >         org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
> >                 at
> >         org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
> >                 at
> >         org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
> >                 at
> >         org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
> >                 at
> >         CustomRecordReader.main(CustomRecordReader.java:257)
> >         Caused by: java.lang.NoSuchMethodException: CustomRecordReader
> >         $MultiLineFileInputFormat.<init>()
> >                 at java.lang.Class.getConstructor0(Class.java:2706)
> >                 at
> >         java.lang.Class.getDeclaredConstructor(Class.java:1985)
> >                 at
> >         org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
> >                 ... 5 more
> >         
> >         
> 
> *********************************
> This message and any attachments (the "message") are confidential and intended solely
for the addressees. 
> Any unauthorised use or dissemination is prohibited.
> Messages are susceptible to alteration. 
> France Telecom Group shall not be liable for the message if altered, changed or falsified.
> If you are not the intended addressee of this message, please cancel it immediately and
inform the sender.
> ********************************
> 
> 


Mime
View raw message