hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GUOJUN Zhu <guojun_...@freddiemac.com>
Subject Does FileSplit respect the record boundary?
Date Fri, 10 Feb 2012 16:56:34 GMT

I am learning Hadoop.  We have some special formated text file for input, 
so we need to write some customized inputFormat, probably based on 
FileInputFormat.  Does the FileInputFormat respect the record boundary 
(every line or maybe every other line)?  I am reading the source code 
(1.0.0).  For example in the LineRecordReader, is "in" field (InputStream) 
of the LineReader(in,..) the full HDFS file (of many blocks) or just the 
real local file of one block?  All books I read have very little details 
about it.   Can any expert point me to some reference about it, or maybe 
which part of the source code I should concentrate on?  Thanks. 

Zhu, Guojun
Modeling Sr Graduate
Financial Engineering
Freddie Mac
View raw message