hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Fletcher <zigomu...@gmail.com>
Subject Re: SequenceFileInputFormat doesn't return whole records
Date Fri, 19 Aug 2011 14:19:46 GMT
Harsh, that was exactly the issue!

Thanks very much for your help
Tim

On 19 August 2011 15:15, Harsh J <harsh@cloudera.com> wrote:

> Tim,
>
> Do you also set your I/O formats explicitly to SequenceFileInputFormat
> and SequenceFileOutputFormat? Via job.setInputFormat/setOutputFormat I
> mean.
>
> Hadoop should not be splitting records across maps/mappers. There are
> specific test cases that ensure this does not happen, so it would seem
> strange if it does this.
>
> On Fri, Aug 19, 2011 at 6:01 PM, Tim Fletcher <zigomushy@gmail.com> wrote:
> > Hi all,
> > I am having issues using SequenceFileInputFormat to retrieve whole
> records
> > I have 1 job that is used to write to a SequenceFile
> > SequenceFileOutputFormat.setOutputPath(job, new Path("out/data"));
> > SequenceFileOutputFormat.setOutputCompressionType(job,
> > SequenceFile.CompressionType.NONE);
> > I then have a second job that is ment to read the file for processing
> > SequenceFileInputFormat.addInputPath(job, new Path("out/data"));
> > However, the values that i get as the arguments to the Map part of my job
> > only seems to contain parts of the record. I am sure that i am missing
> > something rather fundamental as to how Hadoop splits inputs to the
> Mapper,
> > but can't seem to find a way to stop the records being split.
> > Any help (or a pointer to a specific page in the doc) would be greatly
> > appreciated
> > Regards,
> > Tim
>
>
>
> --
> Harsh J
>

Mime
View raw message