hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Virajith Jalaparti <virajit...@gmail.com>
Subject Intermediate data size of Sort example
Date Wed, 29 Jun 2011 09:29:18 GMT

I was running the Sort example in Hadoop 0.20.2 (hadoop-0.20.2-examples.jar)
over an input data size of 100GB (generated using randomwriter) with
800mappers (I was using 128MB of HDFS block size) and 4 reducers over a 3
machine cluster with 2 slave nodes. While the input and output were 100GB, I
found that the intermediate data sent to each reducer was around 78GB,
making the total intermediate data around 310GB. I dont really understand
why there is an increase in data size given that the sort example just uses
the identity mapper and identity reducer.
Could someone please help me out with this?


View raw message