Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates
 209.85.160.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACiRYOfxF=qp9FCRAgBKwcosGvB5qtuAyFZ6Zvpoygev08jcsA@mail.gmail.com>
References: 
 <CACiRYOfxF=qp9FCRAgBKwcosGvB5qtuAyFZ6Zvpoygev08jcsA@mail.gmail.com>
From: Harsh J <harsh@cloudera.com>
Date: Sat, 28 Jul 2012 03:36:28 +0530
Message-ID: 
 <CAOcnVr0NLRC0dO33pOkyLyNJdhPCXXzvrAxw_YZRObcqkmUwfw@mail.gmail.com>
Subject: Re: Reducer MapFileOutpuFormat
To: common-user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Hey Mike,

Inline.

On Tue, Jul 24, 2012 at 1:39 AM, Mike S <mikesam460@gmail.com> wrote:
> If I set my reducer output to map file output format and the job would
> say have 100 reducers, will the output generate 100 different index
> file (one for each reducer) or one index file for all the reducers
> (basically one index file per job)?

Each MapFile gets its own index file, so yes 100 index files for 100
map files, given each Reducer creating one map file as its output.

> If it is one index file per reducer, can rely on HDFS append to change
> the index write behavior and build one index file from all the
> reducers by basically making all the parallel reducers to append to
> one index file? Data files do not matter.

I don't think MapFiles are append-able yet. For one, the index
complicates things as it keeps offsets based from 0 to length (I
think). Work for easy-appending sequence files has been ongoing
though: https://issues.apache.org/jira/browse/HADOOP-7139. Maybe you
can take a look and help enable MapFiles do the same somehow?

-- 
Harsh J