hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teppo Kurki <...@iki.fi>
Subject Re: How is big file got divided
Date Thu, 20 Apr 2006 11:40:59 GMT
Lei Chen wrote:

>It seems that big
>file can be split within one line. But the map/reduce will still work
>properly since the dfs layer will hide the block layout information from the
>map/reduce tasks.
>  
>

It's up to the InputFormat to handle records that are split on FileSplit 
boundaries.

TextInputFormat apparently reads a line past the end of the Split 
boundary and starts reading from the first linebreak encountered. See 
http://svn.apache.org/viewcvs.cgi/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TextInputFormat.java?view=markup

for details.

(I added this info to http://wiki.apache.org/lucene-hadoop/HadoopMapReduce).





Mime
View raw message