hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From He Chen <airb...@gmail.com>
Subject Re: Question about how Hadoop stores intermediate results
Date Sun, 25 Sep 2011 21:01:11 GMT
Hi Arun and Harsh J

Thank you for your replies.

Yes, there will be two finally. But during the map running, there are more
than two.

The scenario I mentioned before will not occur with the Hadoop default
partitioner. If there is a partitioner lead to above problem. Is there any
security policy prevent this?

We all know that the unbalanced keys distribution can lead to the
differences of reduce tasks' execution time even in homogeneous environment.
It will be easier to rearrange unbalanced keys if each key occupies a file.



On Sun, Sep 25, 2011 at 2:55 PM, Arun C Murthy <acm@hortonworks.com> wrote:

> There is only one file per-map. Actually two, an output file and an index
> file to quickly get the offset/length for a given reducer.
> The index file is also cached in memory for performance.
> Arun
> On Sep 25, 2011, at 10:00 AM, He Chen wrote:
> > Hi everyone
> >
> > According to my understanding of Hadoop, it save MapReduce  job's
> > intermediate results into files in the mapper's hard drive. Each key will
> > occupy a file. I am curious what will happen if mapper's hard drive does
> not
> > have enough inodes to save the generated keys.  Because every file needs
> a
> > inode.
> >
> > Best wishes!
> >
> > Chen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message