hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WangRamon <ramon_w...@hotmail.com>
Subject RE: Mapper Record Spillage
Date Sun, 11 Mar 2012 04:05:14 GMT

How man map/reduce tasks slots do you have for each node? If the total number is 10, then
you will use 10 * 4096mb memory when all tasks are running, which is bigger than the total
memory 32G you have for each node.
 Date: Sat, 10 Mar 2012 20:00:13 -0800
Subject: Mapper Record Spillage
From: huhlig@uhlisys.com
To: mapreduce-user@hadoop.apache.org

I am attempting to speed up a mapping process whose input is GZIP compressed CSV files. The
files range from 1-2GB, I am running on a Cluster where each node has a total of 32GB memory
available to use. I have attempted to tweak mapred.map.child.jvm.opts with -Xmx4096mb and
io.sort.mb to 2048 to accommodate the size but I keep getting java heap errors or other memory
related problems. My row count per mapper is well below Integer.MAX_INTEGER limit by several
orders of magnitude and the box is NOT using anywhere close to its full memory allotment.
How can I specify that this map task can have 3-4 GB of memory for the collection, partition
and sort process without constantly spilling records to disk? 		 	   		  
Mime
View raw message