Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DDC7C200C86 for ; Wed, 31 May 2017 23:45:15 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DC258160BCB; Wed, 31 May 2017 21:45:15 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2F2D1160BC2 for ; Wed, 31 May 2017 23:45:15 +0200 (CEST) Received: (qmail 84753 invoked by uid 500); 31 May 2017 21:45:09 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 84728 invoked by uid 99); 31 May 2017 21:45:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 May 2017 21:45:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E1D2A1AFA3F for ; Wed, 31 May 2017 21:45:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id z-mVksD7vYCL for ; Wed, 31 May 2017 21:45:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id C9EDF5FD29 for ; Wed, 31 May 2017 21:45:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 305F7E0DBA for ; Wed, 31 May 2017 21:45:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 7A3A421B60 for ; Wed, 31 May 2017 21:45:04 +0000 (UTC) Date: Wed, 31 May 2017 21:45:04 +0000 (UTC) From: "Vrushali C (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-6323) Rolling upgrade/config change is broken on timeline v2. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 31 May 2017 21:45:16 -0000 [ https://issues.apache.org/jira/browse/YARN-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032023#comment-16032023 ] Vrushali C commented on YARN-6323: ---------------------------------- Hmm, I have been thinking over this and I think we all discussed a bit in the last weekly call too. During upgrade, in any case, there won't be complete information for that flow since some containers would have already finished, some might be running on older nodes, some might start on newer ones. The NM does not have the app name but needs to create a default flow context upon restart. The only thing that I can see it can use is the app id. We could put in a special case to drop the data in the writer if a particular flow context is being used. What I mean is, when the NM restarts with atsv2 enabled for the first time and does not find an existing flow context, we create a specific dummy flow context and we check for that in the writer. If it matches this "drop data" flow context, we simply do not write the data to the backend. With YARN-6555, the work preserving restart will ensure that flow context is written and thus will be available when the NM restarts at later occasions, so the dummy flow context won't be used in the future cases. > Rolling upgrade/config change is broken on timeline v2. > -------------------------------------------------------- > > Key: YARN-6323 > URL: https://issues.apache.org/jira/browse/YARN-6323 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Li Lu > Assignee: Vrushali C > Labels: yarn-5355-merge-blocker > Attachments: YARN-6323.001.patch > > > Found this issue when deploying on real clusters. If there are apps running when we enable timeline v2 (with work preserving restart enabled), node managers will fail to start due to missing app context data. We should probably assign some default names to these "left over" apps. I believe it's suboptimal to let users clean up the whole cluster before enabling timeline v2. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org