Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 98855 invoked from network); 19 Apr 2006 05:04:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 19 Apr 2006 05:04:16 -0000 Received: (qmail 34330 invoked by uid 500); 19 Apr 2006 05:04:16 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 34312 invoked by uid 500); 19 Apr 2006 05:04:15 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 34302 invoked by uid 99); 19 Apr 2006 05:04:15 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Apr 2006 22:04:15 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Apr 2006 22:04:15 -0700 Received: from ajax.apache.org (localhost.localdomain [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 85895D49FE for ; Wed, 19 Apr 2006 06:03:54 +0100 (BST) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Wed, 19 Apr 2006 05:03:54 -0000 Message-ID: <20060419050354.11087.51316@ajax.apache.org> Subject: [Lucene-hadoop Wiki] Update of "HadoopMapReduce" by TeppoKurki X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by TeppoKurki: http://wiki.apache.org/lucene-hadoop/HadoopMapReduce ------------------------------------------------------------------------------ When an individual !MapTask task starts it will open a new output writer per configured Reduce task. It will then proceed to read its !FileSplit using the !RecordReader it gets from the specified - InputFormat. !InputFormat parses the input and generates + !InputFormat. !InputFormat parses the input and generates key-value pairs. It is not necessary for the !InputFormat to generate both "meaningful" keys and values. For example the default !TextInputFormat's output consists of input lines as @@ -31, +31 @@ passed to the configured Mapper. The user supplied Mapper does whatever it wants with the input pair and calls [http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/mapred/OutputCollector.html#collect(org.apache.hadoop.io.WritableComparable,%20org.apache.hadoop.io.Writable) OutputCollector.collect] with key-value pairs of its own choosing. The output it generates must use one key class and one value class, because - the Map output will be eventually written into a SequenceFile, + the Map output will be eventually written into a !SequenceFile, which has per file type information and all the records must have the same type (use subclassing if you want to output different data structures). The Map input and output key-value