hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Question about how Hadoop stores intermediate results
Date Mon, 26 Sep 2011 04:39:20 GMT

On Sep 25, 2011, at 2:01 PM, He Chen wrote:

> Hi Arun and Harsh J
> 
> Thank you for your replies.
> 
> Yes, there will be two finally. But during the map running, there are more
> than two.
> 
> The scenario I mentioned before will not occur with the Hadoop default
> partitioner. If there is a partitioner lead to above problem. Is there any
> security policy prevent this?
> 

Irrespective of the partitioner used a single file stores all keys/values during a single
iteration of each 'spill' after sorting records in the sort-buffer.

You could have multiple spills, but you have lots of keys/values in each spill - we never
do file per record. You'd very quickly run out of inodes.

In very early days we had a file per reducer and that caused huge issues, never mind file
per record.

Arun
Mime
View raw message