hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "HadoopMapReduce" by TeppoKurki
Date Thu, 20 Apr 2006 11:37:52 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by TeppoKurki:
http://wiki.apache.org/lucene-hadoop/HadoopMapReduce

------------------------------------------------------------------------------
  writer per configured Reduce task. It will then proceed to read
  its !FileSplit using the [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/RecordReader.html
RecordReader] it gets from the specified
  [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/InputFormat.html InputFormat].
!InputFormat parses the input and generates
- key-value pairs. InputFormat must also handle records that may be split on the FileSplit
boundary - for example [http://svn.apache.org/viewcvs.cgi/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TextInputFormat.java?view=markup
TextInputFormat] reads the last line of the FileSplit past the split boundary and when it
starts reading other than the first FileSplit first scans for the first newline.
+ key-value pairs. !InputFormat must also handle records that may be split on the !FileSplit
boundary - for example [http://svn.apache.org/viewcvs.cgi/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TextInputFormat.java?view=markup
TextInputFormat] reads the last line of the !FileSplit past the split boundary and when it
starts reading other than the first !FileSplit it first scans for the first newline.
  
  It is not necessary for the !InputFormat to
  generate both "meaningful" keys and values. For example the

Mime
View raw message