hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Virajith Jalaparti <virajit...@gmail.com>
Subject Re: Intermediate data size of Sort example
Date Wed, 29 Jun 2011 13:00:30 GMT
I would like to clarify my earlier question: I found that each reducer
reports FILE_BYTES_READ as around 78GB and HDFS_BYTES_WRITTEN as 25GB and
REDUCE_SHUFFLE_BYTES as 25GB. So, why is the FILE_BYTES_READ  78GB and not
just 25GB?


On Wed, Jun 29, 2011 at 10:29 AM, Virajith Jalaparti

> Hi,
> I was running the Sort example in Hadoop 0.20.2
> (hadoop-0.20.2-examples.jar) over an input data size of 100GB (generated
> using randomwriter) with 800mappers (I was using 128MB of HDFS block size)
> and 4 reducers over a 3 machine cluster with 2 slave nodes. While the input
> and output were 100GB, I found that the intermediate data sent to each
> reducer was around 78GB, making the total intermediate data around 310GB. I
> dont really understand why there is an increase in data size given that the
> sort example just uses the identity mapper and identity reducer.
> Could someone please help me out with this?
> Thanks!!

View raw message