hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahesh Balija <balijamahesh....@gmail.com>
Subject Re: Input splits for sequence file input
Date Mon, 03 Dec 2012 06:51:26 GMT
Hi Jeff,

            Beyond the hdfs blocks, there is something called as *
InputSplit/FileSplit* (in your terms logical structure).
            Mapper operates on InputSplits using the RecordReader and this
RecordReader is specific to InputFormat.
            InputFormat parses the input and generates key-value pairs.

            InputFormat also handle records that may be split on the
FileSplit boundary (i.e., different blocks).

            Please check this link for more information,

Mahesh Balija,
Calsoft Labs.

On Mon, Dec 3, 2012 at 3:33 AM, Jeff LI <uniquejeff@gmail.com> wrote:

> Hello,
> I was reading on the relationship between input splits and HDFS blocks and
> a question came up to me:
> If a logical record crosses HDFS block boundary, let's say block#1 and
> block#2, does the mapper assigned with this input split asks for (1) both
> blocks, or (2) block#1 and just the part of block#2 that this logical
> record extends to, or (3) block#1 and part of block#2 up to some sync point
> that covers this particular logical record?  Note the input is sequence
> file.
> I guess my question really is: does Hadoop operate on a block basis or
> does it respect some sort of logical structure within a block when it's
> trying to feed the mappers with input data.
> Cheers
> Jeff

View raw message