hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teppo Kurki <...@iki.fi>
Subject Re: How is big file got divided
Date Thu, 20 Apr 2006 11:40:59 GMT
Lei Chen wrote:

>It seems that big
>file can be split within one line. But the map/reduce will still work
>properly since the dfs layer will hide the block layout information from the
>map/reduce tasks.

It's up to the InputFormat to handle records that are split on FileSplit 

TextInputFormat apparently reads a line past the end of the Split 
boundary and starts reading from the first linebreak encountered. See 

for details.

(I added this info to http://wiki.apache.org/lucene-hadoop/HadoopMapReduce).

View raw message