hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Fletcher <zigomu...@gmail.com>
Subject SequenceFileInputFormat doesn't return whole records
Date Fri, 19 Aug 2011 12:31:31 GMT
Hi all,

I am having issues using SequenceFileInputFormat to retrieve whole records

I have 1 job that is used to write to a SequenceFile

SequenceFileOutputFormat.setOutputPath(job, new Path("out/data"));
 SequenceFileOutputFormat.setOutputCompressionType(job,
SequenceFile.CompressionType.NONE);

I then have a second job that is ment to read the file for processing

SequenceFileInputFormat.addInputPath(job, new Path("out/data"));

However, the values that i get as the arguments to the Map part of my job
only seems to contain parts of the record. I am sure that i am missing
something rather fundamental as to how Hadoop splits inputs to the Mapper,
but can't seem to find a way to stop the records being split.

Any help (or a pointer to a specific page in the doc) would be greatly
appreciated

Regards,
Tim

Mime
View raw message