hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: What is the reason for putting the output of one mapper task into one file ?
Date Thu, 17 Jun 2010 14:57:07 GMT
Not performance, but stability.

We used to put the output of maps in r files (where r is number of  
reduces) and quickly found out that the local disk would run out of  
inodes after running a few mid-to-large sized jobs (in terms or m * r).

https://issues.apache.org/jira/browse/HADOOP-331

Arun

On Jun 16, 2010, at 7:53 PM, Jeff Zhang wrote:

> Hi all,
>
> I check the source code of Mapper Task, it seems that the output of
> one mapper task is one data file and one index file. And reducer task
> will fetch part of the output of mapper.
> I am wondering why not putting the output of mapper into n files (n is
> the reducer number), since mapper task knows the Partitioner. and the
> logic will be much easier. Is there any performance consideration for
> putting the output into one file ? Thanks.
>
>
> -- 
> Best Regards
>
> Jeff Zhang


Mime
View raw message