Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7CDC6187C4 for ; Sat, 23 Jan 2016 12:01:07 +0000 (UTC) Received: (qmail 49832 invoked by uid 500); 23 Jan 2016 12:01:00 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 49692 invoked by uid 500); 23 Jan 2016 12:01:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 49681 invoked by uid 99); 23 Jan 2016 12:01:00 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Jan 2016 12:01:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id BB7871A011A for ; Sat, 23 Jan 2016 12:00:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id aRQeSiNBpMX8 for ; Sat, 23 Jan 2016 12:00:52 +0000 (UTC) Received: from mail-lb0-f180.google.com (mail-lb0-f180.google.com [209.85.217.180]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id EAFDD205FA for ; Sat, 23 Jan 2016 12:00:51 +0000 (UTC) Received: by mail-lb0-f180.google.com with SMTP id cl12so53696585lbc.1 for ; Sat, 23 Jan 2016 04:00:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=6Vfx6ON84yFXY5vv2Sy2UbKus/xzL5ZIZsjn3hORw9s=; b=EmDtPxOQ5Z28zqHQKilmr/k7k12MO8CJXf8tj+4urL4QaZBpTc2sKXkElOyfEQpNeV Pcjr5TtImhDbXrL2UgV8CFkOysM+LAnARlsmdCwPxwaNNJbnrEewPnv7Uj4lG+91mwp1 ONGUDWj9yL4kzwHvclcGCQS+pN/pojHk9U3BIeFrkfZrmR1z+QIsLXYbSjXoNXFo4fak jrHLvVg9pJ9equBft/QiCKTeVfwwtkniB0AOOqafKYq9w1bHzySlSTNz+mT0BTQkorJR Rs3v4Fw8Bf0rQQEDP+4WUbih0wXBlGtbnWFy4wj3TU1l1irI3ZSYTxLUD0BkwYe6o3Li mIvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=6Vfx6ON84yFXY5vv2Sy2UbKus/xzL5ZIZsjn3hORw9s=; b=EXmpz90Fr2XkX16Y5qpw+pl8HpcTE/0pezC0wB+DjdLVYkow81GLTu9c26U8D4YewS ah2RtyCEa0m5NF3vebc71b7vifnliVx2ukqmW3Aheuo5CaXUxon8OQh1+IZysbYk/R2i yecmpRsQ1uq/Ole+BqMMS837sBiAPe+rXU25rATkUxCB/GrcXPp5Ib+fy5ZPAOzYtodI rMgW5uSdOLZ7phFwZ+d7RVOoRPYm7zO9mVKGJxFsbfO3r0BLtH/HBzTS1YcQLuSLkydq bMfUSSEvzgdNIHrzjtB01B4Nnb2Pe9L6X2hb+dQCQZ6zMLJfzW7LIA3z8cCxKHPAPga7 jo5A== X-Gm-Message-State: AG10YOQ9cZ36QJLRUyp6iWrtOMETZEqMhxtpafO55+1Pu1k0Zj46giWLo4eQMx7+L9HWp23CJt8K6LVLjd5FdQ== MIME-Version: 1.0 X-Received: by 10.112.198.131 with SMTP id jc3mr3079562lbc.118.1453550450433; Sat, 23 Jan 2016 04:00:50 -0800 (PST) Received: by 10.25.40.8 with HTTP; Sat, 23 Jan 2016 04:00:50 -0800 (PST) In-Reply-To: <16711AE3-EFCE-4D46-8403-14296245B192@connexity.com> References: <16711AE3-EFCE-4D46-8403-14296245B192@connexity.com> Date: Sat, 23 Jan 2016 14:00:50 +0200 Message-ID: Subject: Re: NodeManager High CPU due to high GC From: Daniel Haviv To: Randy Fox Cc: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a11c33f8042e6170529ff16b4 --001a11c33f8042e6170529ff16b4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Randy, How much cores do you have on your machines and how much did you allocate for Yarn? Daniel On Saturday, 23 January 2016, Randy Fox wrote: > Hi, > > We just upgraded to using Yarn on Hadoop 2.6.0 =E2=80=93 CDH5.4.5 > We are running a large job =E2=80=93 200K mappers, 100K reducers and we c= an=E2=80=99t get > through the shuffle phase. The node managers are 800% cpu and high GC. > The reducers get socket timouts after 1.5 hours of running and only getti= ng > a few percent of the data from the mappers. This job took about 30 hours > total 12 in mappers on MRv1 with no issues. > > I have looked for configs that might help or issues filed and anyone that > has seen this and I have come up with nothing. > Anyone have ideas on things to try or explain why the node managers are i= n > GC hell and why the data is just not flowing from mappers to reducers? > > Thanks in advanced, > > Randy > --001a11c33f8042e6170529ff16b4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Randy,
How much cores do you have on your machines and how much did = you allocate for Yarn?

Daniel

= On Saturday, 23 January 2016, Randy Fox <rfox@connexity.com> wrote:
Hi,

We just upgraded to using Yarn on Hadoop 2.6.0 =E2=80=93 CDH5.4.5
We are running a large job =E2=80=93 200K mappers, 100K reducers and w= e can=E2=80=99t get through the shuffle phase.=C2=A0 The node managers are = 800% cpu and high GC.=C2=A0 The reducers get socket timouts after 1.5 hours= of running and only getting a few percent of the data from the mappers.=C2=A0 This job took about 30 hours total 12 in mappers on MRv= 1 with no issues.

I have looked for configs that might help or issues filed and anyone t= hat has seen this and I have come up with nothing.
Anyone have ideas on things to try or explain why the node managers ar= e in GC hell and why the data is just not flowing from mappers to reducers?=

Thanks in advanced,

Randy
--001a11c33f8042e6170529ff16b4--