Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DA65CDF39 for ; Fri, 18 Jan 2013 21:18:55 +0000 (UTC) Received: (qmail 73082 invoked by uid 500); 18 Jan 2013 21:18:51 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 73000 invoked by uid 500); 18 Jan 2013 21:18:51 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 72993 invoked by uid 99); 18 Jan 2013 21:18:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jan 2013 21:18:51 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of acm@hortonworks.com designates 209.85.210.44 as permitted sender) Received: from [209.85.210.44] (HELO mail-da0-f44.google.com) (209.85.210.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jan 2013 21:18:44 +0000 Received: by mail-da0-f44.google.com with SMTP id z20so1802356dae.17 for ; Fri, 18 Jan 2013 13:18:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=4VF2+/nZjoNpprtsGKPd0+K6wyUS3l14BjozF1Jt2zQ=; b=k+xJA0CbYNFmJu6eUS9nohZIU2M9zFiRHoaRv9Oe4uS9kXH92fttLhI3g5RTOFGrV1 hISkkuthd32XONwyYRL6RP1zOQRqk637S53UPE7X2LwpOjj/5/3tXCu2Y+OK3tWhDk3A 4ZIIyO6DsZku8OsgYBJqSCLhdeGJpcTb3Evvt5aQTTfTN7C884aw3HZa5aAQ8ew7dI6G N2Ak1plSRUSujDe9SudVYzG6R5e+dVR1ICRVRRdCOLNInqskEN73cDj7gTLbCuts5nlf CyUoyWCX2ovsMLAu4Ip1p2ozJHN3IRTsPf6RrCuZcxq38DxNkQHCreTQLNvGtkf+HgbE IPjw== X-Received: by 10.66.76.198 with SMTP id m6mr27133144paw.32.1358543904034; Fri, 18 Jan 2013 13:18:24 -0800 (PST) Received: from [10.11.3.113] (host1.hortonworks.com. [70.35.59.2]) by mx.google.com with ESMTPS id qt3sm3702647pbb.32.2013.01.18.13.18.20 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 18 Jan 2013 13:18:20 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: config for high memory jobs does not work, please help. From: Arun C Murthy In-Reply-To: Date: Fri, 18 Jan 2013 13:18:19 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@hadoop.apache.org X-Mailer: Apple Mail (2.1084) X-Gm-Message-State: ALoCoQkwN+Bw5D+AhgxoVkQIyvDMulO6loT9x8OsGI8RMifLH84l0bBTmPn+mTcTnuNm+7kAVcLF X-Virus-Checked: Checked by ClamAV on apache.org Take a look at the CapacityScheduler and 'High RAM' jobs where-by you = can run M map slots per node and request, per-job, that you want N = (where N =3D max(1, N, M)). Some more info: = http://hadoop.apache.org/docs/stable/capacity_scheduler.html#Resource+base= d+scheduling = http://hortonworks.com/blog/understanding-apache-hadoops-capacity-schedule= r/ hth, Arun On Jan 18, 2013, at 12:05 PM, Shaojun Zhao wrote: > Dear all, >=20 > I know it is best to use small amount of mem in mapper and reduce. > However, sometimes it is hard to do so. For example, in machine > learning algorithms, it is common to load the model into mem in the > mapper step. When the model is big, I have to allocate a lot of mem > for the mapper. >=20 > Here is my question: how can I config hadoop so that it does not fork > too many mappers and run out of physical memory? >=20 > My machines have 24G, and I have 100 of them. Each time, hadoop will > fork 6 mappers on each machine, no matter what config I used. I really > want to reduce it to what ever number I want, for example, just 1 > mapper per machine. >=20 > Here are the config I tried. (I use streaming, and I pass the config > in the command line) >=20 > -Dmapred.child.java.opts=3D-Xmx8000m <-- did not bring down the = number of mappers >=20 > -Dmapred.cluster.map.memory.mb=3D32000 <-- did not bring down the = number > of mappers >=20 > Am I missing something here? > I use Hadoop 0.20.205 >=20 > Thanks a lot in advance! > -Shaojun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/