Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4D6B9D2BF for ; Fri, 14 Dec 2012 23:04:31 +0000 (UTC) Received: (qmail 54876 invoked by uid 500); 14 Dec 2012 23:04:26 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 54798 invoked by uid 500); 14 Dec 2012 23:04:26 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 54790 invoked by uid 99); 14 Dec 2012 23:04:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Dec 2012 23:04:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gyang@millennialmedia.com designates 206.225.164.221 as permitted sender) Received: from [206.225.164.221] (HELO hub021-nj-5.exch021.serverdata.net) (206.225.164.221) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Dec 2012 23:04:16 +0000 Received: from MBX021-E3-NJ-2.exch021.domain.local ([10.240.4.78]) by HUB021-NJ-5.exch021.domain.local ([10.240.4.89]) with mapi id 14.02.0318.001; Fri, 14 Dec 2012 15:03:56 -0800 From: Guang Yang To: "user@hadoop.apache.org" CC: Peter Sheridan , Jim Brooks Subject: question of how to take full advantage of cluster resources Thread-Topic: question of how to take full advantage of cluster resources Thread-Index: AQHN2k9KJADh+H6c9UyW/Ysj3bg7+g== Date: Fri, 14 Dec 2012 23:03:54 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [38.118.54.10] Content-Type: multipart/alternative; boundary="_000_A8B921227BEEA7429E36512A6D1A259E17CE5A9CMBX021E3NJ2exch_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_A8B921227BEEA7429E36512A6D1A259E17CE5A9CMBX021E3NJ2exch_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, We have a beefy Hadoop cluster with 12 worker nodes and each one with 32 co= res. We have been running Map/reduce jobs on this cluster and we noticed th= at if we configure the Map/Reduce capacity in the cluster to be less than t= he available processors in the cluster (32 x 12 =3D 384), say 216 map slots= and 144 reduce slots (360 total), the jobs run okay. But if we configure t= he total Map/Reduce capacity to be more than 384, we observe that sometimes= job runs unusual long and the symptom is that certain tasks (usually map t= asks) are stuck in "initializing" stage for a long time on certain nodes, b= efore get processed. The nodes exhibiting this behavior are random and not = tied to specific boxes. Isn't the general rule of thumb of configuring M/R = capacity to be twice the number of processors in the cluster? What do peopl= e usually do to try to maximize the usage of the cluster resources in term = of cluster capacity configuration? I'd appreciate any responses. Thanks, Guang Yang --_000_A8B921227BEEA7429E36512A6D1A259E17CE5A9CMBX021E3NJ2exch_ Content-Type: text/html; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable
Hi,

We have a beefy Hadoop cluster with 12 worker nodes and each one with = 32 cores. We have been running Map/reduce jobs on this cluster and we notic= ed that if we configure the Map/Reduce capacity in the cluster to be less t= han the available processors in the cluster (32 x 12 =3D 384), say 216 map slots and 144 reduce slots (360= total), the jobs run okay. But if we configure the total Map/Reduce c= apacity to be more than 384, we observe that sometimes job runs unusual lon= g and the symptom is that certain tasks (usually map tasks) are stuck in "initializing" stage for a long= time on certain nodes, before get processed. The nodes exhibiting this beh= avior are random and not tied to specific boxes. Isn't the general rule of = thumb of configuring M/R capacity to be twice the number of processors in the cluster? What do people usually do to try = to maximize the usage of the cluster resources in term of cluster capacity = configuration? I'd appreciate any responses.

Thanks,
Guang Yang
--_000_A8B921227BEEA7429E36512A6D1A259E17CE5A9CMBX021E3NJ2exch_--