hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "HadoopMapReduce" by MattKangas
Date Wed, 19 Apr 2006 16:07:36 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by MattKangas:

  == Map ==
  As the Map operation is parallelized the input file set is first
- split to several pieces called !FileSplits. If an individual file
+ split to several pieces called [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/FileSplit.html
FileSplits]. If an individual file
  is so large that it will affect seek time it will be split to
  several Splits. The splitting does not know anything about the
  input file's internal logical structure, for example
@@ -18, +18 @@

  When an individual !MapTask task starts it will open a new output
  writer per configured Reduce task. It will then proceed to read
- its !FileSplit using the !RecordReader it gets from the specified
- !InputFormat. !InputFormat parses the input and generates
+ its !FileSplit using the [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/RecordReader.html
RecordReader] it gets from the specified
+ [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/InputFormat.html InputFormat].
!InputFormat parses the input and generates
  key-value pairs. It is not necessary for the !InputFormat to
  generate both "meaningful" keys and values. For example the
  default !TextInputFormat's output consists of input lines as

View raw message