Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 10C7B185D0 for ; Thu, 28 Jan 2016 11:49:41 +0000 (UTC) Received: (qmail 73128 invoked by uid 500); 28 Jan 2016 11:49:20 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 73027 invoked by uid 500); 28 Jan 2016 11:49:20 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 73014 invoked by uid 99); 28 Jan 2016 11:49:19 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jan 2016 11:49:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 5A0891A0889 for ; Thu, 28 Jan 2016 11:49:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id IOfDeLJw94YG for ; Thu, 28 Jan 2016 11:49:08 +0000 (UTC) Received: from mail-ig0-f173.google.com (mail-ig0-f173.google.com [209.85.213.173]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 392F830413 for ; Thu, 28 Jan 2016 11:49:07 +0000 (UTC) Received: by mail-ig0-f173.google.com with SMTP id ik10so11504141igb.1 for ; Thu, 28 Jan 2016 03:49:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=zHMf5KoXo1I7y5rzJ/G0I6dTjSv04mZ0/PD8JjZ+Lbo=; b=ssIYSMF/k3AdEtctoovZNv6xTWzLfVBYwDAsJKNbvnt4cRS3bg0h6pvjR9nTQwg88v LbKmYepaDyPc3b5K/TzMjUpF//uvMddWceoPpL0hFqXLpw/vhi9k0VoD9K5WXMqJBTFp lk5vpyvMnNz3y+rJUIwq5fqmWSIlEUD//mHmZ4TZJqXtmVtOB4t2MeUX2z4qrefmNZ3J 7H1IutM5STE8a74bVxzqfJP1DM9FxwViM6KVsEJkhNNI6P2hE1OttFNsExsdXzb3dnZg 8R7YuHY7elxAwuZyONIhQo2xVta2CH2F4k6trJ6Xsi72cTF1AwahZLNXcivbJ0gfvUDv QYQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=zHMf5KoXo1I7y5rzJ/G0I6dTjSv04mZ0/PD8JjZ+Lbo=; b=nKrIl5A+m9NBLgnmHnRiE7akD/RblQrYMbKotnixElQ/q1Eqo9u3ap42meIgw+ReBU unxOCCgSoK8gBUfOF1voPJ1DFog2190bgzBnL986179inpQ3IuLeqJ3uJ1T5uud8V3Gb tTA2Pdz1A0igmKhFOWJRpczqRqCrjY1Yc80rcOAicpErfX92KQTtk/leht9AiuiZI2SS +c1YD8lareJwQqIZoJLVLoafbqWqjWM3/b932+Ac8hIaWyj4m/Nqj6/FfCzgHXHRsHq+ X5LV/8wwh9wF+aUSNC0bNrtn6FjlTWmmyv+iZymrHhHYK4f/pOeiYKJO8RHVwjU7pHPL cdNg== X-Gm-Message-State: AG10YORK/O3QhIEYC6eWV2gPiuzE8XqE+47zzZC+haokUhsA2Ap7VPD+sRa7vVgO75blFTKjy355aHgp90VcMg== MIME-Version: 1.0 X-Received: by 10.50.8.106 with SMTP id q10mr2685462iga.67.1453981746120; Thu, 28 Jan 2016 03:49:06 -0800 (PST) Received: by 10.79.71.193 with HTTP; Thu, 28 Jan 2016 03:49:06 -0800 (PST) In-Reply-To: <9AC9C547-9083-42AD-A603-005334236B14@connexity.com> References: <16711AE3-EFCE-4D46-8403-14296245B192@connexity.com> <222120F5-3D08-49E7-AC14-A9C4DA2BB22A@connexity.com> <9AC9C547-9083-42AD-A603-005334236B14@connexity.com> Date: Thu, 28 Jan 2016 17:19:06 +0530 Message-ID: Subject: Re: NodeManager High CPU due to high GC From: sudhakara st To: Randy Fox Cc: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=089e013c65dc7cd06e052a638102 --089e013c65dc7cd06e052a638102 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Randy, It is too many mappers and reducers(200K mappers, 100K reducers) for any cluster, it has indirect effects. It seems reducer not able get enough memory to perform processing or not able to reducer container. Check values for these two parameters *yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores,* It could be better if look into http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-co= nsuming-gotchas How many nodes cluster do you have ? Regard, sudhakara On Tue, Jan 26, 2016 at 9:55 PM, Randy Fox wrote: > What configs control the shuffle phase? > > From: Randy Fox > Date: Saturday, January 23, 2016 at 9:53 AM > To: Daniel Haviv > Cc: "user@hadoop.apache.org" > Subject: Re: NodeManager High CPU due to high GC > > 24 virtual cores and we allocated 22 for Yarn > > From: Daniel Haviv > Date: Saturday, January 23, 2016 at 4:00 AM > To: Randy Fox > Cc: "user@hadoop.apache.org" > Subject: Re: NodeManager High CPU due to high GC > > Hi Randy, > How much cores do you have on your machines and how much did you allocate > for Yarn? > > Daniel > > On Saturday, 23 January 2016, Randy Fox wrote: > >> Hi, >> >> We just upgraded to using Yarn on Hadoop 2.6.0 =E2=80=93 CDH5.4.5 >> We are running a large job =E2=80=93 200K mappers, 100K reducers and we = can=E2=80=99t get >> through the shuffle phase. The node managers are 800% cpu and high GC. >> The reducers get socket timouts after 1.5 hours of running and only gett= ing >> a few percent of the data from the mappers. This job took about 30 hour= s >> total 12 in mappers on MRv1 with no issues. >> >> I have looked for configs that might help or issues filed and anyone tha= t >> has seen this and I have come up with nothing. >> Anyone have ideas on things to try or explain why the node managers are >> in GC hell and why the data is just not flowing from mappers to reducers= ? >> >> Thanks in advanced, >> >> Randy >> > --=20 Regards, ...sudhakara --089e013c65dc7cd06e052a638102 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello Randy,
It is=C2=A0too many= mappers and reducers(200K mappers, 100K reduc= ers)=C2=A0for any cluster, it has=C2=A0indirect=C2= =A0effects.=C2=A0 It seems reducer not able get=C2=A0enough=C2=A0memory to = perform=C2=A0processing or not able to reducer=C2=A0container.=C2=A0=
Check =C2=A0values for these two paramet= ers
= yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcore= s,<= span style=3D"font-size:14px">
=C2=A0It could be better if look into=C2=A0
sudhak= ara

On Tue, Jan 26, 2016 at 9:55 PM, Randy Fox &l= t;rfox@connexity.co= m> wrote:
What configs control the shuffle phase?

From: Randy Fox
Date: Saturday, January 23, 2016 at= 9:53 AM
To: Daniel Haviv Cc: "user@hadoop.apache.org"
Subject: Re: NodeManager High CPU d= ue to high GC

24 virtual cores and we allocated 22 for Yarn

From: Daniel Haviv
Date: Saturday, January 23, 2016 at= 4:00 AM
To: Randy Fox
Cc: "user@hadoop.apache.org"
Subject: Re: NodeManager High CPU d= ue to high GC

Hi Randy,
How much cores do you have on your machines and how much did you alloc= ate for Yarn?

Daniel

On Saturday, 23 January 2016, Randy Fox <rfox@connexity.com> wrote:
Hi,

We just upgraded to using Yarn on Hadoop 2.6.0 =E2=80=93 CDH5.4.5
We are running a large job =E2=80=93 200K mappers, 100K reducers and w= e can=E2=80=99t get through the shuffle phase.=C2=A0 The node managers are = 800% cpu and high GC.=C2=A0 The reducers get socket timouts after 1.5 hours= of running and only getting a few percent of the data from the mappers.=C2=A0 This job took about 30 hours total 12 in mappers on MRv= 1 with no issues.

I have looked for configs that might help or issues filed and anyone t= hat has seen this and I have come up with nothing.
Anyone have ideas on things to try or explain why the node managers ar= e in GC hell and why the data is just not flowing from mappers to reducers?=

Thanks in advanced,

Randy



--
=C2=A0 =C2=A0 =C2=A0=C2=A0
Regards,
...sudhakara
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
--089e013c65dc7cd06e052a638102--