Return-Path: X-Original-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 320F918F7B for ; Tue, 15 Dec 2015 19:37:34 +0000 (UTC) Received: (qmail 28888 invoked by uid 500); 15 Dec 2015 19:37:33 -0000 Delivered-To: apmail-hadoop-yarn-dev-archive@hadoop.apache.org Received: (qmail 28799 invoked by uid 500); 15 Dec 2015 19:37:33 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-dev@hadoop.apache.org Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 28788 invoked by uid 99); 15 Dec 2015 19:37:33 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Dec 2015 19:37:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0760F1A05A4 for ; Tue, 15 Dec 2015 19:37:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.874 X-Spam-Level: *** X-Spam-Status: No, score=3.874 tagged_above=-999 required=6.31 tests=[FSL_HELO_BARE_IP_2=0.873, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id TtfHOXTOeJJO for ; Tue, 15 Dec 2015 19:37:22 +0000 (UTC) Received: from relayvx11b.securemail.intermedia.net (relayvx11b.securemail.intermedia.net [64.78.52.184]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 9AEAA206E3 for ; Tue, 15 Dec 2015 19:37:21 +0000 (UTC) Received: from securemail.intermedia.net (localhost [127.0.0.1]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by emg-ca-1-1.localdomain (Postfix) with ESMTPS id DA1AB53E7F; Tue, 15 Dec 2015 11:37:19 -0800 (PST) Subject: Re: [Timeline V2 branch] Latest timeline v2 and SMP problem MIME-Version: 1.0 x-echoworx-msg-id: 28520740-204b-495e-8cdc-0e93291fe5aa x-echoworx-emg-received: Tue, 15 Dec 2015 11:37:19.830 -0800 x-echoworx-action: delivered Received: from 10.254.155.14 ([10.254.155.14]) by emg-ca-1-1 (JAMES SMTP Server 2.3.2) with SMTP ID 985; Tue, 15 Dec 2015 11:37:19 -0800 (PST) Received: from MBX080-W4-CO-1.exch080.serverpod.net (unknown [10.224.117.101]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by emg-ca-1-1.localdomain (Postfix) with ESMTPS id 96A7D53F1A; Tue, 15 Dec 2015 11:37:19 -0800 (PST) Received: from MBX080-W4-CO-2.exch080.serverpod.net (10.224.117.102) by MBX080-W4-CO-1.exch080.serverpod.net (10.224.117.101) with Microsoft SMTP Server (TLS) id 15.0.1130.7; Tue, 15 Dec 2015 11:37:18 -0800 Received: from MBX080-W4-CO-2.exch080.serverpod.net ([10.224.117.102]) by mbx080-w4-co-2.exch080.serverpod.net ([10.224.117.102]) with mapi id 15.00.1130.005; Tue, 15 Dec 2015 11:37:18 -0800 From: Li Lu To: "yarn-dev@hadoop.apache.org" CC: "Naganarasimha G R (Naga)" , Varun Saxena , Sangjin Lee , Junping Du , Vrushali Channapattan , "Joep Rottinghuis" Thread-Topic: [Timeline V2 branch] Latest timeline v2 and SMP problem Thread-Index: AQHRNtlubQ/H9TgovkCanXYbyya8vp7L1Z6AgAANuQCAACULgIAABaKAgADgIwCAAApEAA== Date: Tue, 15 Dec 2015 19:37:17 +0000 Message-ID: References: <4335BFA2-D106-4AC5-8C21-9D8AB2F39516@hortonworks.com> <0FE709BE-FC67-43B4-964D-F6F8AF7D4248@hortonworks.com> <13B139A9-AF38-4197-9AD4-5309C90BA121@hortonworks.com> In-Reply-To: <13B139A9-AF38-4197-9AD4-5309C90BA121@hortonworks.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [192.175.27.10] x-source-routing-agent: Processed Content-Type: multipart/alternative; boundary="_000_CD46F12008A048AC9F8754551B7DD05Ahortonworkscom_" --_000_CD46F12008A048AC9F8754551B7DD05Ahortonworkscom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable OK I fixed this and verified locally. Patch is posted to YARN-4460. Patch i= s simple so a quick review is much appreciated. Anyone hitting this problem= is more than welcome to verify. Thanks! Li Lu On Dec 15, 2015, at 11:00, Li Lu > wrote: Thanks Varun and Naga! I verified locally that the V2 publisher introduced = in YARN-4129 caused this problem. I=92ll open a JIRA and post a quick fix r= ight away. Thanks for the information! Li Lu On Dec 14, 2015, at 21:38, Naganarasimha G R (Naga) > wrote: Hi Varun & Li, Yes Varun most possible reason would be what you mentioned and it has to be= done in serviceInit which is taken care in V1 Publisher but missed in V2 P= ublisher. Entire logic present in serviceStart of V2Publisher should be moved to serv= iceInit. But was wondering for which event/entity ? was it in RM Recover mode ? Regards, + Naga ________________________________ From: Varun Saxena [vsaxena.varun@gmail.com= ] Sent: Tuesday, December 15, 2015 10:48 To: Li Lu Cc: yarn-dev@hadoop.apache.org; Sangjin Lee; Junping Du; Vrushali Channapattan; = Joep Rottinghuis; Naganarasimha G R (Naga) Subject: Re: [Timeline V2 branch] Latest timeline v2 and SMP problem Hi Li, This is because we are registering the event in serviceStart() instead of s= erviceInit(). As SMP is the last service in the list, its started right in the end i.e. e= ven after all the RPCs', UI related stuff. This can cause an app flow to start before the SMP/V2Publisher service has = even started. This is what causes the issue. You want to raise JIRA for this issue or should I ? I can handle it. Regards, Varun Saxena. On Tue, Dec 15, 2015 at 8:35 AM, Li Lu > wrote: Thanks Sangjin. I=92ll keep tracing this. Meanwhile, if anybody has reprodu= ced the problem, please feel free to let me know. Thanks! Li Lu On Dec 14, 2015, at 18:16, Sangjin Lee > wrote: Can you bisect the commits to see if you can isolate which commit introduced the issue? On Mon, Dec 14, 2015 at 5:39 PM, Li Lu > wrote: Hi YARN developers working on Timeline v2 (YARN-2928) branch, I just realized I=92ve accidentally turned off SMP for my local Timeline v2 build. After I turned yarn.system-metrics-publisher.enabled back on, the RM fails to start with the following FATAL message: 2015-12-14 17:27:54,125 INFO ipc.Server (Server.java:run(797)) - IPC Server listener on 8033: starting 2015-12-14 17:27:54,127 FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread true java.lang.Exception: No handler for registered for class org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractSystemMetrics= Publi sher$SystemMetricsEventType at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:= 185) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109= ) at java.lang.Thread.run(Thread.java:745) 2015-12-14 17:27:54,127 INFO event.AsyncDispatcher (AsyncDispatcher.java:register(208)) - Registering class org.apache.hadoop.yarn.serve r.resourcemanager.metrics.AbstractSystemMetricsPublisher$SystemMetricsEvent= Type for class org.apache.hadoop.yarn.server.resourcemanager.m etrics.TimelineServiceV2Publisher$TimelineV2EventHandler Interestingly, we=92re registering this class to timeline v2 handler in the next line of log. I=92m wondering if this is caused by some of my missing configs, or a newly introduced issue? Has anybody on feature-YARN-2928 branch noticed this issue? Thanks! Li Lu --_000_CD46F12008A048AC9F8754551B7DD05Ahortonworkscom_--