hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Grep" by DanielNaber
Date Thu, 09 Oct 2008 18:53:39 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by DanielNaber:
http://wiki.apache.org/hadoop/Grep

------------------------------------------------------------------------------
  = Grep Example =
- '''Grep''' example extracts matching strings from text files and count how many time they
occured.
+ '''Grep''' example extracts matching strings from text files and counts how many time they
occured.
+ 
+ To run the example, type the following command:[[BR]]
+ {{{bin/hadoop org.apache.hadoop.examples.Grep <indir> <outdir> <regex>
[<group>]}}}
+ 
+ The command works different than the Unix {{{grep}}} call: it doesn't display the complete
matching line, but only the matching string, so in order to display lines matching "foo",
use {{{.*foo.*}}} as a regular expression.
  
  The program runs two map/reduce jobs in sequence. The first job counts how many times a
matching string occured and the second job sorts matching strings by their frequency and stores
the output in a single output file.
   
@@ -11, +16 @@

  
  The example also demonstrates how to pass a command-line parameter to a mapper or a reducer.
This is done by adding (key, value) pairs to the job's configuration before the job is submitted.
Map or reduce tasks are able to access the value by getting it from the job's configuration
in the method ''configure''.
  
+ Grep supports generic options: see DevelopmentCommandLineOptions 
- To run the example, type the following command:[[BR]]
- bin/hadoop org.apache.hadoop.examples.Grep <indir> <outdir> <regex> [<group>]
  
- Grep supports generic options : see DevelopmentCommandLineOptions 
- 

Mime
View raw message