hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: HDFS file content restrictions
Date Fri, 04 Mar 2011 20:45:41 GMT
If, for example, you have a record that contains 20MB in one block and 1MB in another, Map/Reduce
will feed you the entire 21MB record.  If you are lucky and the map is executing on a node
with the 20MB block, MapReduce will transfer 1MB out of HDFS for you.

This is glossing over some details, but the point is that MR will feed you whole records regardless
of whether they are stored on one or two blocks.


On Mar 4, 2011, at 2:24 PM, Kelly Burkhart wrote:

> On Fri, Mar 4, 2011 at 1:42 PM, Harsh J <qwertymaniac@gmail.com> wrote:
>> HDFS does not operate with records in mind.
> So does that mean that HDFS will break a file at exactly <blocksize>
> bytes?  Map/Reduce *does* operate with records in mind, so what
> happens to the split record?  Does HDFS put the fragments back
> together and deliver the reconstructed record to one map?  Or are both
> fragments and consequently the whole record discarded?
> Thanks,
> -Kelly

View raw message