Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 99072 invoked from network); 10 May 2010 15:30:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 May 2010 15:30:19 -0000 Received: (qmail 5046 invoked by uid 500); 10 May 2010 15:30:19 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 4988 invoked by uid 500); 10 May 2010 15:30:18 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 4980 invoked by uid 99); 10 May 2010 15:30:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 May 2010 15:30:18 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=AWL,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sonalgoyal4@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 May 2010 15:30:13 +0000 Received: by pwi2 with SMTP id 2so1826259pwi.35 for ; Mon, 10 May 2010 08:29:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=8EUWVrqEM2iOoKCERLU4y2ZFsz7w64l5KozP4uuBNeQ=; b=gIF9+UDq/Ld4d+LaVSQJZfHHX0SDZ3dU+q1oXHixGjA1AE0QJDRBiIwvAV0ulCjgY8 6yoqGnrT/fpcmCdZ2BINgQKD3IBcGUY4zGcsVzhjOIc2d+JKXCkQK6NRtR+8npxD6hqz qI5Mff6fc9dUU6qlKhfLGQxWAHmW35N9MqOxY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=TugTANBi+TuohvQKzzryGFMKpIVWE8OERnwFCUD8WCsJ8nwVDtp1Zr58URh5qkzYtp b5joJ5D2goBBiJ6IxBG97OUmv6Sxu/yhOsf0b8ZDBYmX/V9WSMnrymQ0ZeGGdWjADRk8 rfh6qYsyM8Fl31N3Il+izzytTYt8JlEf6G+cs= MIME-Version: 1.0 Received: by 10.143.27.41 with SMTP id e41mr2863944wfj.343.1273505393569; Mon, 10 May 2010 08:29:53 -0700 (PDT) Received: by 10.142.43.6 with HTTP; Mon, 10 May 2010 08:29:53 -0700 (PDT) In-Reply-To: <201005101208.o4AC8Tq0005383@post.webmailer.de> References: <201005101208.o4AC8Tq0005383@post.webmailer.de> Date: Mon, 10 May 2010 20:59:53 +0530 Message-ID: Subject: Re: MultipleOutputs or Partitioner From: Sonal Goyal To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00504502cd119a706a04863f14f5 --00504502cd119a706a04863f14f5 Content-Type: text/plain; charset=ISO-8859-1 Hi Alan, You can use MultipleOutputFormat. You can override the generateFileName...methods to get the functionality you want. A partitioner controls how data moves from the mapper to the reducer, so if you take that approach, you will have to specify the number of reducers as the number of files you want, which is not the best option if some days have more data than the others. You also dont have control over the file name. See Tom White's Hadoop The Definitive Guide for an excellent example and usage. Thanks and Regards, Sonal www.meghsoft.com On Mon, May 10, 2010 at 5:38 PM, Some Body wrote: > Hi, > > I'm trying to understand how to generate multiple outputs in my reducer > (using 0.20.2+228). > Do I need MultipleOutput or should I partition my output in the mapper? > > My reducer currently gets key/val input pairs like this which all end up in > my part_r_0000 file. > > hostA_VarX_2010-05-01_morning > hostA_VarY_2010-05-01_morning > hostA_VarX_2010-05-01_afternoon > hostA_VarY_2010-05-01_afternoon > ..... > hostB_VarX_2010-05-01_morning > hostB_VarY_2010-05-01_morning > hostB_VarX_2010-05-01_afternoon > hostB_VarY_2010-05-01_afternoon > ..... > hostA_VarX_2010-05-02_morning > hostA_VarY_2010-05-02_morning > hostA_VarX_2010-05-02_afternoon > hostA_VarY_2010-05-02_afternoon > ..... > hostB_VarX_2010-05-02_morning > hostB_VarY_2010-05-02_morning > hostB_VarX_2010-05-02_afternoon > hostB_VarY_2010-05-02_afternoon > ..... > > But instead of 1 output file I want one output file per day/group. e.g. > 2010-05-01_morning.txt > 2010-05-01_afternoon.txt > > Each _