hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: What is the reason for putting the output of one mapper task into one file ?
Date Thu, 17 Jun 2010 16:23:13 GMT
Arun, thanks for your reply.



On Thu, Jun 17, 2010 at 7:57 AM, Arun C Murthy <acm@yahoo-inc.com> wrote:
> Not performance, but stability.
>
> We used to put the output of maps in r files (where r is number of reduces)
> and quickly found out that the local disk would run out of inodes after
> running a few mid-to-large sized jobs (in terms or m * r).
>
> https://issues.apache.org/jira/browse/HADOOP-331
>
> Arun
>
> On Jun 16, 2010, at 7:53 PM, Jeff Zhang wrote:
>
>> Hi all,
>>
>> I check the source code of Mapper Task, it seems that the output of
>> one mapper task is one data file and one index file. And reducer task
>> will fetch part of the output of mapper.
>> I am wondering why not putting the output of mapper into n files (n is
>> the reducer number), since mapper task knows the Partitioner. and the
>> logic will be much easier. Is there any performance consideration for
>> putting the output into one file ? Thanks.
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Mime
View raw message