hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dieter De Witte <drdwi...@gmail.com>
Subject Re: is mapreduce.task.io.sort.mb control both map merge buffer and reduce merge buffer?
Date Thu, 12 Dec 2013 04:57:01 GMT
this parameter is the size of a spill on the map side, each time the spill
is full it is sorted and written to disk. On the reduce side there is also
a range of parameters. I am not sure why you would increase these buffer
sizes since they are eating up your heapsize, it depends on what you mean
with a heavy job. In my case a heavy job needed a lot of heap size so I
scaled down the buffers for inmemory merging. to learn more about the
tuning in the shuffle and sort phase check the reference:

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

Reading this will be an eye-opener..


2013/12/12 ch huang <justlooks@gmail.com>

> hi,maillist:
>               Due to the heavy job on reduce task, i try to increase
> buffer size for sort merge,i wander if i increase mapreduce.task.io.sort.mb
> from 100m(default value) to 1G will cause each map task  sort merge buffer
> also become 1G?
>

Mime
View raw message