hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maha <m...@umail.ucsb.edu>
Subject Re: Spilled Records
Date Tue, 22 Feb 2011 06:49:47 GMT
Thank you Saurabh, but the following setting didn't change # of spilled records:

conf.set("mapred.job.shuffle.merge.percent", ".9");//instead of .66
conf.set("mapred.inmem.merge.threshold", "10000000");// instead of 1000

IS it's because of my memory being 4GB ?? 	    

I'm using the pseudo distributed mode. 

Thank you,

On Feb 21, 2011, at 7:46 PM, Saurabh Dutta wrote:

> Hi Maha,
> The spilled record has to do with the transient data during the map and reduce operations.
Note that it's not just the map operations that generate the spilled records. When the in-memory
buffer (controlled by mapred.job.shuffle.merge.percent) runs out or reaches the threshold
number of map outputs (mapred.inmem.merge.threshold), it is merged and spilled to disk.
> You are going in the right direction by tuning the io.sort.mb parameter and try increasing
it further. If it still doesn't work out, try the io.sort.factor, fs.inmemory.size.mb. Also,
try the other two variables that i mentioned earlier.
> Let us know what worked for you.
> Sincerely,
> Saurabh Dutta
> Impetus Infotech India Pvt. Ltd.,
> Sarda House, 24-B, Palasia, A.B.Road, Indore - 452 001
> Phone: +91-731-4269200 4623
> Fax: + 91-731-4071256
> Email: saurabh.dutta@impetus.co.in
> www.impetus.com
> ________________________________________
> From: maha [maha@umail.ucsb.edu]
> Sent: Tuesday, February 22, 2011 8:21 AM
> To: common-user
> Subject: Spilled Records
> Hello every one,
> Does spilled records mean that the sort-buffer size for sorting is not enough to sort
all the input records, hence some records are written to local disk ?
> If so, I tried setting my io.sort.mb from the default 100 to 200 and there was still
the same # of spilled records. Why ?
> Does changing io.sort.record.percent to be .9 instead .8 might produce unexpected exceptions
> Thank you,
> Maha
> ________________________________
> Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World Conference
(Feb 13-18) in Las Vegas.We are also bringing cloud experts together at CloudCamp, Delhi on
Feb 12. CloudCamp is an unconference where early adopters of Cloud Computing technologies
exchange ideas.
> Click http://www.impetus.com to know more.
> NOTE: This message may contain information that is confidential, proprietary, privileged
or otherwise protected by law. The message is intended solely for the named addressee. If
received in error, please destroy and notify the sender. Any use of this email is prohibited
when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity
of this communication has been maintained nor that the communication is free of errors, virus,
interception or interference.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message