hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: spilled records
Date Fri, 16 May 2014 14:28:08 GMT
Your first understanding is not correct. Where do you get that interruption from the book?
About the #spilled records, every record of output of mapper will be spilled at least one
time.So in ideal scenario, these 2 numbers should be equal. If they are not, and spilled number
is much larger than the records count of output of mappers, then you maybe need to adjust
"io.sort.mb" configuration.

From: yu_libo@hotmail.com
To: user@hadoop.apache.org
Subject: spilled records
Date: Thu, 8 May 2014 21:17:35 -0400


According to ""Hadoop: the definitive guide", when mapreduce.job.shuffle.input.buffer.percent
large enough, the map outputs are copied directly into the reduce JVM memory.

I set this parameter to 0.5 which is large enough to hold map outputs, but #spilled records
is still the same 
as reduce input records.  Anybody knows why? Thanks.


View raw message