hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <java8...@hotmail.com>
Subject RE: spilled records
Date Fri, 16 May 2014 14:28:08 GMT
Your first understanding is not correct. Where do you get that interruption from the book?
About the #spilled records, every record of output of mapper will be spilled at least one
time.So in ideal scenario, these 2 numbers should be equal. If they are not, and spilled number
is much larger than the records count of output of mappers, then you maybe need to adjust
"io.sort.mb" configuration.
Yong 

From: yu_libo@hotmail.com
To: user@hadoop.apache.org
Subject: spilled records
Date: Thu, 8 May 2014 21:17:35 -0400




Hi, 

According to ""Hadoop: the definitive guide", when mapreduce.job.shuffle.input.buffer.percent
is 
large enough, the map outputs are copied directly into the reduce JVM memory.

I set this parameter to 0.5 which is large enough to hold map outputs, but #spilled records
is still the same 
as reduce input records.  Anybody knows why? Thanks.

Libo


 		 	   		   		 	   		  
Mime
View raw message