hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "WordCount" by OwenOMalley
Date Wed, 28 Jun 2006 21:23:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by OwenOMalley:
http://wiki.apache.org/lucene-hadoop/WordCount

------------------------------------------------------------------------------
  
  '''WordCount''' example reads text files and counts how often words occur.  The input is
text files and the output is text files, each line of which contains a word and the count
of how often it occured, separated by a tab.
  
- Each mapper takes a line as input and breaks it into words. It then emits word and 1 pair.
Each reducer sums the frequencies of a word.
+ Each mapper takes a line as input and breaks it into words. It then emits a key/value pair
of the word and 1. Each reducer sums the counts for each word and emits a single key/value
with the word and sum.
  
- The output of maps are locally summed by setting the comibiner class to be the same as the
Reducer class.
+ As an optimization, the reducer is also used as a combiner on the map outputs. This reduces
the amount of data sent across the network by combining each word into a single record.
  
  To run the example, the command syntax is[[BR]]
- bin/hadoop org.apache.hadoop.examples.WordCount [-m <#maps>] [-r <#reducers>]
<in-dir> <out-dir>
+ bin/hadoop jar build/hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>]
<in-dir> <out-dir>
  

Mime
View raw message