hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject InputSplit and RecordReader
Date Fri, 20 Aug 2010 01:47:56 GMT
  From what I understand the InputSplit is a byte slice of a particular 
file which is then handed off to an individual mapper for processing. Is 
the size of the InputSplit equal to the hadoop block ie 64/128mb? If 
not, what is the size.

Now the RecordReaders takes in bytes from the InputSplit and transforms 
that to a record-oriented structure suitable for use within a mapper.. 
ie key/value correct? Now the wiki says its the RecordReaders job is to 
respect record boundaries.. how is this accomplished? Say I have an 
InplutSplit which is 100kb in size and each record is approximately 30kb 
in size. What happens to the last 10kb in this example? I believe I read 
somewhere that it will read past that boundary but how is that possible 
if the RecordReader has only been presented with 100kb?

Can someone please clarify some of these issues for me. Thanks

Mime
View raw message