hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yu Li <car...@gmail.com>
Subject Re: Config
Date Wed, 24 Nov 2010 04:43:18 GMT
Hi William,

I think the most proper config parameter to try is io.sort.factor, which
affects disk spilling times on both map and reduce side. The default value
of this parameter is 10, try to enlarge it to 100 or more.

If the spilling on reduce side is still frequent you could try tuning
up mapred.job.shuffle.input.buffer.percent along with
mapred.child.java.opts, which may reduce disk spilling times in the shuffle
phase. The default value of mapred.job.shuffle.input.buffer.percent is 0.7,
with mapred.child.java.opts -Xmx200m by default.

Notice that increasing these values will also increase the memory cost, so
we need to make sure memory won't become the system bottleneck.

Hope this could help.

On 24 November 2010 04:58, William <wtheisinger@gmail.com> wrote:

> We are currently modifying the configuration of our hadoop grid (250
> machines).  The machines are homogeneous and the specs are
> dual quad core cpu 18Gb ram 8x1tb drives
> currently we have set this up  -
> 8 reduce slots at 800mb
> 8 map slots at 800mb
> raised our io.sort.mb to 256mb
> we see a lot of spilling on both maps and reduces and I am wondering what
> other configs I should be looking into
> Thanks

Best Regards,
Li Yu

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message