hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Input splits for sequence file input
Date Mon, 03 Dec 2012 02:08:03 GMT
method createRecordReader will handle the record boundary issue. You can
check the code for details

On Mon, Dec 3, 2012 at 6:03 AM, Jeff LI <uniquejeff@gmail.com> wrote:

> Hello,
> I was reading on the relationship between input splits and HDFS blocks and
> a question came up to me:
> If a logical record crosses HDFS block boundary, let's say block#1 and
> block#2, does the mapper assigned with this input split asks for (1) both
> blocks, or (2) block#1 and just the part of block#2 that this logical
> record extends to, or (3) block#1 and part of block#2 up to some sync point
> that covers this particular logical record?  Note the input is sequence
> file.
> I guess my question really is: does Hadoop operate on a block basis or
> does it respect some sort of logical structure within a block when it's
> trying to feed the mappers with input data.
> Cheers
> Jeff

Best Regards

Jeff Zhang

View raw message