Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 887559E92 for ; Sun, 11 Mar 2012 04:08:59 +0000 (UTC) Received: (qmail 40712 invoked by uid 500); 11 Mar 2012 04:08:58 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 40547 invoked by uid 500); 11 Mar 2012 04:08:57 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 40535 invoked by uid 99); 11 Mar 2012 04:08:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Mar 2012 04:08:57 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,HTML_OBFUSCATE_10_20,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.176] (HELO mail-we0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Mar 2012 04:08:51 +0000 Received: by werc1 with SMTP id c1so2927581wer.35 for ; Sat, 10 Mar 2012 20:08:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=vifIzcpreZXnuyDNNUEorPSAIjfNds22rgL+uxbr7Mc=; b=mOBeIqHbfYWHa2qdzYoOad0Dch/a8B0FvwHm1x2NcldmyYPExZHDogEJgKNOX46pcr yuY3gkXIuwNQaq7c9SP8yHj5b8cnbUYJSsBs3SWqhGZIZxZ4xQy6OHC9l1XN6eLdOuc6 eYIyF3dEC3kduT4v7xuTKEcc4296TCDEmKqYqkP3FGrKsN5RV+1BBwOsQKVWuojbz1gS KLjsTS/p4vb6LlV2RK5d3VV5vUuiqvdZysdS0/62h/3kRZQ8AZXanCa5BCsJNiyhPDoF Uqa3B+FfjR0CAPEmo16VydG7RBNfTaAKFIYYQnifMN4Y9h/6q3WvYrM8bmh97wBEn3EJ aTLg== MIME-Version: 1.0 Received: by 10.180.104.65 with SMTP id gc1mr16787781wib.13.1331438910218; Sat, 10 Mar 2012 20:08:30 -0800 (PST) Received: by 10.180.104.66 with HTTP; Sat, 10 Mar 2012 20:08:30 -0800 (PST) X-Originating-IP: [173.11.67.230] In-Reply-To: References: Date: Sat, 10 Mar 2012 20:08:30 -0800 Message-ID: Subject: Re: Mapper Record Spillage From: Hans Uhlig To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d043bdb7648a2db04baefc7b3 X-Gm-Message-State: ALoCoQmrX7+qQlGkEqdKvV8lApXmRuxqVxmkS9d7PT5bat9ubb1KreVUNrBiE6hZ7X2ihAI/hQLj X-Virus-Checked: Checked by ClamAV on apache.org --f46d043bdb7648a2db04baefc7b3 Content-Type: text/plain; charset=ISO-8859-1 I am attempting to specify this for a single job during its creation/submission. Not via the general construct. I am using the new api so I am adding the values to the conf passed into new Job(); 2012/3/10 WangRamon > How man map/reduce tasks slots do you have for each node? If the > total number is 10, then you will use 10 * 4096mb memory when all tasks are > running, which is bigger than the total memory 32G you have for each node. > > ------------------------------ > Date: Sat, 10 Mar 2012 20:00:13 -0800 > Subject: Mapper Record Spillage > From: huhlig@uhlisys.com > To: mapreduce-user@hadoop.apache.org > > I am attempting to speed up a mapping process whose input is GZIP compressed > CSV files. The files range from 1-2GB, I am running on a Cluster where each > node has a total of 32GB memory available to use. I have attempted to tweak > mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to accommodate > the size but I keep getting java heap errors or other memory related > problems. My row count per mapper is well below Integer.MAX_INTEGER limi t > by several orders of magnitude and the box is NOT using anywhere close to its > full memory allotment. How can I specify that this map task can have 3-4 > GB of memory for the collection, partition and sort process without constantly > spilling records to disk? > --f46d043bdb7648a2db04baefc7b3 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I am attempting to specify this for a single job during its creation/submis= sion. Not via the general construct. I am using the new api so I am adding = the values to the conf passed into new Job();

2012/3/10 WangRamon <ramon_wang@hotmail.com>
How man=A0map/reduce tasks=A0slots do you have for each node? If=A0the tota= l=A0number is 10, then you will use 10 * 4096mb memory when all tasks are r= unning, which is bigger than the total memory 32G=A0you have for=A0each nod= e.
=A0

Date: Sat, 10 Mar 2012 20:00:13 -0800
Subject: Mappe= r Record Spillage
From: huhlig@uhlisys.com
To: mapreduce-user@hadoop.apache.org

I am attempting to speed up a mapping process whose input is GZIP= compressed CSV files. The files range from 1-2GB, I am runnin= g on a Cluster where each node has a total of 32GB memory avai= lable to use. I have attempted to tweak mapred.map.child.jvm.o= pts with -Xmx4096mb and io.sort.mb to 2048 to accommodate the = size but I keep getting java heap errors or other memory relat= ed problems. My row count per mapper is well below Integer.MAX_INTEGER limi= t by several orders of magnitude and the box is NOT using anywhere c= lose to=A0its full memory allotment. How can I specify that th= is map task can have 3-4 GB of memory for the collection, part= ition and sort process without constantly spilling records to = disk?

--f46d043bdb7648a2db04baefc7b3--