hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff LI <uniquej...@gmail.com>
Subject Input splits for sequence file input
Date Sun, 02 Dec 2012 22:03:03 GMT

I was reading on the relationship between input splits and HDFS blocks and
a question came up to me:

If a logical record crosses HDFS block boundary, let's say block#1 and
block#2, does the mapper assigned with this input split asks for (1) both
blocks, or (2) block#1 and just the part of block#2 that this logical
record extends to, or (3) block#1 and part of block#2 up to some sync point
that covers this particular logical record?  Note the input is sequence

I guess my question really is: does Hadoop operate on a block basis or does
it respect some sort of logical structure within a block when it's trying
to feed the mappers with input data.



View raw message