Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 036E9CF3C for ; Fri, 27 Jul 2012 22:07:21 +0000 (UTC) Received: (qmail 82836 invoked by uid 500); 27 Jul 2012 22:07:16 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 82702 invoked by uid 500); 27 Jul 2012 22:07:16 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 82694 invoked by uid 99); 27 Jul 2012 22:07:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jul 2012 22:07:16 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-gh0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jul 2012 22:07:09 +0000 Received: by ghbz10 with SMTP id z10so4315924ghb.35 for ; Fri, 27 Jul 2012 15:06:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=iTwBR4W3Yq1UGk9Y3CRNFYq3cxAFErt6AAFR0oHqGSo=; b=ijhwwCGPqPPlqHVmwAeH8BFQm2zWGQ+3XB4lAPOIyX9WdW8i8FtB4DK86fSRPsNvuV IffIxUIAfjT7+YGmMFAKP/q/ueJz1q7ZirS51XJ2zLht2bKYnpHA0LL4QXYh5UhqcEjd P1Pvd0FlqKYL5l1ru/+Kpd0qKWcVM94a2NKgBGuC2liswT/ahrWTgQ2C2UFnqi0FgXe6 myFeGnrRnckyW4a3vPetHZtBN6pztlhRZFB9zP7CzZE7Bc3HrqK+d/DN1mWycn/eY7FP ROe4xp5B8sX8HvjX2AcrQ4vobilTuYns9cZEKPnJ8lfwxQWT6pZwjD9gkzJFQyVxg1B7 cFCA== Received: by 10.50.158.229 with SMTP id wx5mr3330595igb.23.1343426808586; Fri, 27 Jul 2012 15:06:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.82.73 with HTTP; Fri, 27 Jul 2012 15:06:28 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Sat, 28 Jul 2012 03:36:28 +0530 Message-ID: Subject: Re: Reducer MapFileOutpuFormat To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQkp7aGGjfie4LbZSkfgvZoRjxKWRSBK7HXCHJhYTXb3ByqDiIp54olSKXT1T+hCMpMvJcxM X-Virus-Checked: Checked by ClamAV on apache.org Hey Mike, Inline. On Tue, Jul 24, 2012 at 1:39 AM, Mike S wrote: > If I set my reducer output to map file output format and the job would > say have 100 reducers, will the output generate 100 different index > file (one for each reducer) or one index file for all the reducers > (basically one index file per job)? Each MapFile gets its own index file, so yes 100 index files for 100 map files, given each Reducer creating one map file as its output. > If it is one index file per reducer, can rely on HDFS append to change > the index write behavior and build one index file from all the > reducers by basically making all the parallel reducers to append to > one index file? Data files do not matter. I don't think MapFiles are append-able yet. For one, the index complicates things as it keeps offsets based from 0 to length (I think). Work for easy-appending sequence files has been ongoing though: https://issues.apache.org/jira/browse/HADOOP-7139. Maybe you can take a look and help enable MapFiles do the same somehow? -- Harsh J