hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Reducer MapFileOutpuFormat
Date Fri, 27 Jul 2012 22:07:09 GMT
Hi Bertrand,

I believe he is talking about MapFile's index files, explained here:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/MapFile.html

On Fri, Jul 27, 2012 at 11:24 AM, Bertrand Dechoux <dechouxb@gmail.com> wrote:
> Your use of 'index' is indeed not clear. Are you talking about Hive or
> HBase?
>
> I can confirm that you will have one result file per reducer. Of course,
> for efficiency reasons, you need to limit the number of files. But if you
> are using multiple reducers it should mean that one reducer isn't fast
> enough, so it could be assumed that the output for each reducer is big
> enough. If that not the case, you can limit the number of reducer to one.
>
> In general, the 'fragmentation' of the results is dealt by the next job.
> You should provide more information about your real problem and its context.
>
> Bertrand
>
> On Fri, Jul 27, 2012 at 3:15 AM, syed kather <in.abdul@gmail.com> wrote:
>
>> Mike ,
>> Can you please give more details . Context is not clear . Can you share ur
>> use case if possible
>> On Jul 24, 2012 1:40 AM, "Mike S" <mikesam460@gmail.com> wrote:
>>
>> > If I set my reducer output to map file output format and the job would
>> > say have 100 reducers, will the output generate 100 different index
>> > file (one for each reducer) or one index file for all the reducers
>> > (basically one index file per job)?
>> >
>> > If it is one index file per reducer, can rely on HDFS append to change
>> > the index write behavior and build one index file from all the
>> > reducers by basically making all the parallel reducers to append to
>> > one index file? Data files do not matter.
>> >
>>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Mime
View raw message