hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Uhlig <huh...@uhlisys.com>
Subject Mapper Record Spillage
Date Sun, 11 Mar 2012 04:00:13 GMT
I am attempting to speed up a mapping process whose input is GZIP compressed
CSV files. The files range from 1-2GB, I am running on a Cluster where each
node has a total of 32GB memory available to use. I have attempted to tweak
mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to accommodate
the size but I keep getting java heap errors or other memory related
problems. My row count per mapper is well below Integer.MAX_INTEGER limit
by several orders of magnitude and the box is NOT using anywhere close to its
full memory allotment. How can I specify that this map task can have 3-4 GB
of memory for the collection, partition and sort process without constantly
spilling records to disk?

Mime
View raw message