Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6603818AE1 for ; Thu, 30 Jul 2015 14:38:05 +0000 (UTC) Received: (qmail 23214 invoked by uid 500); 30 Jul 2015 14:38:05 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 23166 invoked by uid 500); 30 Jul 2015 14:38:05 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 23154 invoked by uid 99); 30 Jul 2015 14:38:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2015 14:38:05 +0000 Date: Thu, 30 Jul 2015 14:38:05 +0000 (UTC) From: "Hong Zhiguo (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (YARN-3965) Add starup timestamp for nodemanager MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-3965: ------------------------------ Attachment: YARN-3965-3.patch > Add starup timestamp for nodemanager > ------------------------------------ > > Key: YARN-3965 > URL: https://issues.apache.org/jira/browse/YARN-3965 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Reporter: Hong Zhiguo > Assignee: Hong Zhiguo > Priority: Minor > Attachments: YARN-3965-2.patch, YARN-3965-3.patch, YARN-3965.patch > > > We have startup timestamp for RM already, but don't for NM. > Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. > If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)