Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 9856 invoked from network); 7 Aug 2006 18:07:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 7 Aug 2006 18:07:40 -0000 Received: (qmail 93938 invoked by uid 500); 7 Aug 2006 18:07:40 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 93921 invoked by uid 500); 7 Aug 2006 18:07:40 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 93911 invoked by uid 99); 7 Aug 2006 18:07:40 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Aug 2006 11:07:40 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Aug 2006 11:07:39 -0700 Received: from ajax.apache.org (localhost [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 72F5FD495F for ; Mon, 7 Aug 2006 19:07:18 +0100 (BST) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Mon, 07 Aug 2006 18:07:18 -0000 Message-ID: <20060807180718.27396.60329@ajax.apache.org> Subject: [Lucene-hadoop Wiki] Update of "PythonWordCount" by OwenOMalley X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by OwenOMalley: http://wiki.apache.org/lucene-hadoop/PythonWordCount New page: = WordCount Example in Python = This is the WordCount example completely translated into [http://python.org/ Python] and translated using [http://www.jython.org/Project/index.html Jython] into a Java jar file. The program reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occured, separated by a tab. Each mapper takes a line as input and breaks it into words. It then emits a key/value pair of the word and 1. Each reducer sums the counts for each word and emits a single key/value with the word and sum. As an optimization, the reducer is also used as a combiner on the map outputs. This reduces the amount of data sent across the network by combining each word into a single record. To compile the example, build the Hadoop code:{{{ ant cd src/examples/python ./compile }}} To run the example, the command syntax is: {{{ ../../../bin/hadoop jar wc.jar [-m <#maps>] [-r <#reducers>] \ }}}