hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen He <airb...@gmail.com>
Subject Re: Custom InputFormat errer
Date Wed, 29 Aug 2012 07:57:18 GMT
Hi Harsh

Thank you for your reply. Do you mean I need to change the FileSplit to
avoid those errors I mentioned happen?

Regards!

Chen

On Wed, Aug 29, 2012 at 2:46 AM, Harsh J <harsh@cloudera.com> wrote:

> Hi Chen,
>
> Does your record reader and mapper handle the case where one map split
> may not exactly get the whole record? Your case is not very different
> from the newlines logic presented here:
> http://wiki.apache.org/hadoop/HadoopMapReduce
>
> On Wed, Aug 29, 2012 at 11:13 AM, Chen He <airbots@gmail.com> wrote:
> > Hi guys
> >
> > I met a interesting problem when I implement my own custom InputFormat
> which
> > extends the FileInputFormat.(I rewrite the RecordReader class but not the
> > InputSplit class)
> >
> > My recordreader will take following format as a basic record: (my
> > recordreader extends the LineRecordReader. It returns a record if it
> meets
> > #Trailer# and contains #Header#. I only have one input file that is
> composed
> > of many of following basic record)
> >
> > #Header#
> > .....(many lines, may be 0 lines or 1000 lines, it varies)
> > #Trailer#
> >
> > Everything works fine if above basic input unit in a file is integer
> times
> > of mapper. For example, I use 2 mappers and there are two basic records
> in
> > my input file. Or I use 3 mappers and there are 6 basic units in the
> input
> > file.
> >
> > However, if I use 4 mappers but there are 3 basic units in the input
> > file(not integer times). The final output is incorrect. The "Map Input
> > Bytes" in the job counter is also less than the input file size. How can
> I
> > fix it? Do I need to rewrite the inputSplit?
> >
> > Any reply will be appreciated!
> >
> > Regards!
> >
> > Chen
>
>
>
> --
> Harsh J
>

Mime
View raw message