hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Uhlig <huh...@uhlisys.com>
Subject Re: Mapper Record Spillage
Date Sun, 11 Mar 2012 04:08:30 GMT
I am attempting to specify this for a single job during its
creation/submission. Not via the general construct. I am using the new api
so I am adding the values to the conf passed into new Job();

2012/3/10 WangRamon <ramon_wang@hotmail.com>

>  How man map/reduce tasks slots do you have for each node? If the
> total number is 10, then you will use 10 * 4096mb memory when all tasks are
> running, which is bigger than the total memory 32G you have for each node.
> ------------------------------
> Date: Sat, 10 Mar 2012 20:00:13 -0800
> Subject: Mapper Record Spillage
> From: huhlig@uhlisys.com
> To: mapreduce-user@hadoop.apache.org
> I am attempting to speed up a mapping process whose input is GZIP compressed
> CSV files. The files range from 1-2GB, I am running on a Cluster where each
> node has a total of 32GB memory available to use. I have attempted to tweak
> mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to accommodate
> the size but I keep getting java heap errors or other memory related
> problems. My row count per mapper is well below Integer.MAX_INTEGER limi t
> by several orders of magnitude and the box is NOT using anywhere close to its
> full memory allotment. How can I specify that this map task can have 3-4
> GB of memory for the collection, partition and sort process without constantly
> spilling records to disk?

View raw message