hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Srivastava <Ajay.Srivast...@guavus.com>
Subject Spilled records
Date Wed, 23 Jan 2013 10:34:44 GMT

I was tuning mapred job to reduce number of spills and reached a stage where following numbers
are same -

Spilled Records in map = Spilled records in reduce = Combine output Records = Reduce Input

I do not see any lines in mapper logs with following strings -
1. Spilling map output: record full
2. Spilling map output: buffer full

Only these strings -
1. Finished spill 0 ( Note 0 at the end )

I am confused and can someone please explain what's going on ?

1. Though neither buffer nor record got full yet there are spills ? Is it that mapper writing
records at the end to be consumed by reducer that's why I see these spills ?
2. Why is combiner running if there were no spills ? If my guess is correct in point 1 then,
will combiner not run if number of mappers < min.num.spills.for.combine ?
3. Why spills are counted in reducer stats ?
4. Is there way that I can tell mapper not to write final output to disk and reducers fetch
the data from mapper's main memory ?

Ajay Srivastava 
View raw message