hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Mapper Record Spillage
Date Sun, 11 Mar 2012 04:41:30 GMT
Hans,

Its possible you may have an typo issue: mapred.map.child.jvm.opts -
Such a property does not exist. Perhaps you wanted
"mapred.map.child.java.opts"?

Additionally, the computation you need to do is (# of map slots on a
TT * per-map-task-heap-requirement) should be at least < (Total RAM -
2/3 GB). With your 4 GB requirement, I guess you can support a max of
6-7 slots per machine (i.e. Not counting reducer heap requirements in
parallel).

On Sun, Mar 11, 2012 at 9:30 AM, Hans Uhlig <huhlig@uhlisys.com> wrote:
> I am attempting to speed up a mapping process whose input is GZIP compressed
> CSV files. The files range from 1-2GB, I am running on a Cluster where each
> node has a total of 32GB memory available to use. I have attempted to tweak
> mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to
> accommodate the size but I keep getting java heap errors or other memory
> related problems. My row count per mapper is well below Integer.MAX_INTEGER
> limit by several orders of magnitude and the box is NOT using anywhere close
> to its full memory allotment. How can I specify that this map task can have
> 3-4 GB of memory for the collection, partition and sort process without
> constantly spilling records to disk?



-- 
Harsh J

Mime
View raw message