hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Question about how Hadoop stores intermediate results
Date Sun, 25 Sep 2011 19:50:07 GMT
Chen,

Files are stored based on the reducer partitions, not exactly per-key.
The result is that there are far lesser files than you imagine there
ought to be. The keys are kept sorted inside the partitioned files and
thus you do not lose out on your key groups either.

See Partitioner, which is responsible for doing the partitioning of
your map outputs:
(http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Partitioner)

On Sun, Sep 25, 2011 at 10:30 PM, He Chen <airbots@gmail.com> wrote:
> Hi everyone
>
> According to my understanding of Hadoop, it save MapReduce  job's
> intermediate results into files in the mapper's hard drive. Each key will
> occupy a file. I am curious what will happen if mapper's hard drive does not
> have enough inodes to save the generated keys.  Because every file needs a
> inode.
>
> Best wishes!
>
> Chen
>



-- 
Harsh J

Mime
View raw message