hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "HadoopMapReduce" by TeppoKurki
Date Wed, 19 Apr 2006 04:55:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by TeppoKurki:
http://wiki.apache.org/lucene-hadoop/HadoopMapReduce

------------------------------------------------------------------------------
  
  As key-value pairs are read from the RecordReader they are
  passed to the configured Mapper. The user supplied Mapper does
+ whatever it wants with the input pair and calls	[http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/OutputCollector.html#collect(org.apache.hadoop.io.WritableComparable,%20org.apache.hadoop.io.Writable)
OutputCollectore.collect] with key-value pairs of its own choosing. The output it
- whatever it wants with the input pair and calls
- <a
- 	href="http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/OutputCollector.html#collect(org.apache.hadoop.io.WritableComparable,%20org.apache.hadoop.io.Writable)">
- 	OutputCollectore.collect
- </a>
- with key-value pairs of its own choosing. The output it
  generates must use one key class and one value class, because
  the Map output will be eventually written into a SequenceFile,
  which has per file type information and all the records must
@@ -55, +50 @@

  all the Map tasks will be routed so that all pairs for a given
  key end up in files targeted at a specific reduce task.
  
- == Combiner ==
+ == Combine ==
- The rationale behind using a combiner is that as the Map
+ The rationale behind using a Combiner is that as the Map
  operation outputs its pairs they are already available in
  memory for a reduce-type function. If a Combiner is used the
  Map output is not immediately written to the output. Instead it
@@ -83, +78 @@

  When a reduce task starts it will have its input scattered in
  several files possibly on several DFS nodes. If run in
  distributed mode these need to be first copied to the local
- filesystem in a
- <acronym>copy phase</acronym>
- (see
- <a
- 	href="http://svn.apache.org/viewcvs.cgi/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/ReduceTaskRunner.java?view=markup">
+ filesystem in a ''copy phase'' (see [href="http://svn.apache.org/viewcvs.cgi/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/ReduceTaskRunner.java?view=markup
ReduceTaskRunner]).
- 	ReduceTaskRunner
- </a>
- ) .
  
  Once all the data is available locally it is appended to one
- file (
- <acronym>append phase</acronym>
- ). The file is then merge sorted so that the key-value pairs for
+ file (''append phase''). The file is then merge sorted so that the key-value pairs for
+ a given key are contiguous (''sort phase''). This makes the actual reduce operation simple:
the file is
- a given key are contiguous (
- <acronym>sort phase</acronym>
- ). This makes the actual reduce operation simple: the file is
  read sequentially and the values are passed to the reduce method
  with an iterator reading the input input file until the next key
- value is encountered. See
- <a
- 	href="http://svn.apache.org/viewcvs.cgi/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/ReduceTask.java?view=markup">
+ value is encountered. See [href="http://svn.apache.org/viewcvs.cgi/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/ReduceTask.java?view=markup
 ReduceTask] for details.
- 	ReduceTask
- </a>
- for details.
  
  In the end the output will consist of one output file per Reduce
  task run. The format of the files can be specified with
+ [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/JobConf.html#setOutputFormat(java.lang.Class)
JobConf.setOutputFormat]. If SequentialOutputFormat is used the output Key and Value
- <a
- 	href="http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/JobConf.html#setOutputFormat(java.lang.Class)">
- 	JobConf.setOutputFormat
- </a>
- . If SequentialOutputFormat is used the output Key and Value
  classes must also be specified.
  

Mime
View raw message