hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: Input splits for sequence file input
Date Mon, 03 Dec 2012 05:52:56 GMT
This question is fundamentally flawed : it assumes that a mapper will ask for anything.

The mapper class "run" method reads from a record reader.  The question you really should
ask is :

How does a RecordReader read records across block boundaries?

Jay Vyas 

On Dec 2, 2012, at 9:08 PM, Jeff Zhang <zjffdu@gmail.com> wrote:

> method createRecordReader will handle the record boundary issue. You can check the code
for details
> On Mon, Dec 3, 2012 at 6:03 AM, Jeff LI <uniquejeff@gmail.com> wrote:
>> Hello,
>> I was reading on the relationship between input splits and HDFS blocks and a question
came up to me:
>> If a logical record crosses HDFS block boundary, let's say block#1 and block#2, does
the mapper assigned with this input split asks for (1) both blocks, or (2) block#1 and just
the part of block#2 that this logical record extends to, or (3) block#1 and part of block#2
up to some sync point that covers this particular logical record?  Note the input is sequence
>> I guess my question really is: does Hadoop operate on a block basis or does it respect
some sort of logical structure within a block when it's trying to feed the mappers with input
>> Cheers
>> Jeff
> -- 
> Best Regards
> Jeff Zhang

View raw message