hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Zhiguo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager
Date Sat, 25 Jul 2015 09:26:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641473#comment-14641473

Hong Zhiguo commented on YARN-3965:

Hi, [~zxu], thanks for your comments.  Here comes my re-consideration.

1. The nmStartupTime could be non-statice field of NodeManager, but it make it harder to access
it since the accesser must have a reference to the NodeManager instance.  For example, there's
no such reference in current implementaion of NodeInfo constructor.  One option is to make
nmStartupTime as a non-static filed of NMContext.  But I doubt is it worth to make simple
thing complecated.  BTW, the startup timestampt of ResourceManager is also static.

2. It's "final" so don't need warry about that. Private field with a Getter is also OK if
you think it's better.

> Add starup timestamp for nodemanager
> ------------------------------------
>                 Key: YARN-3965
>                 URL: https://issues.apache.org/jira/browse/YARN-3965
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Hong Zhiguo
>            Assignee: Hong Zhiguo
>            Priority: Minor
>         Attachments: YARN-3965-2.patch, YARN-3965.patch
> We have startup timestamp for RM already, but don't for NM.
> Sometimes cluster operator modified configuration of all nodes and kicked off command
to restart all NMs.  He found out it's hard for him to check whether all NMs are restarted.
 Actually there's always some NMs didn't restart as he expected, which leads to some error
later due to inconsistent configuration.
> If we have startup timestamp for NM,  the operator could easily fetch it via NM webservice
and find out which NM didn't restart, and take mannaul action for it.

This message was sent by Atlassian JIRA

View raw message