hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junping Du (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
Date Tue, 02 Aug 2016 23:01:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404963#comment-15404963
] 

Junping Du commented on YARN-4676:
----------------------------------

bq. If NM crashes (for example, JVM exit due to out of heap), it suppose to restart automatically,
instead of waiting fur human to start it. Isn't that the general practice? 
I don't think this is a general case as YARN deployment cases could be various - in many cases
(especially at on-premise environment), NM is not supposed to be so fragile and admin need
to figure out what's happening before NM crash. Also, even we want to make NM get restart
immediately (without human assistant/trouble-shoot), the auto restart logic is outside of
YARN but belongs to some cluster deployment/monitor tools like Ambari. Here, we'd better not
to have many assumptions.

bq. But nothing prevent/disallow the NM daemon from restart, wither automatically or by human.
When such NM restart, it will try to register itself to RM, which will be told to shutdown
if it still appear in the exclude list. Such node will remain as DECOMMISSIONED inside RM
until 10+ minutes later into LOST after the EXPIRE event.
As I said above, this belongs to admin's behavior or your monitor tools logic. Just like if
an admin is madly to keep starting a NM which belongs to decommissioned node, YARN can do
nothing about it but just keep shutdown NM. Such node should always keep as DECOMMISSIONED
and I don't see any benefit to move it to EXPIRE status.

bq. Such DECOMMISSIONED node can be recommissioned (refreshNodes after it is removed from
the exclude list). During which it is transition into RUNNING state.
I don't see this hack can bring any benefit, comparing with refreshNode with moving it to
include list and restart the NM deamon which will go through normal register process. The
risk is we need to take care a separated code path that is dedicated for this minor case.

bq. These behavior appears to me as robust instead of hacking. It appears that the behavior
you expected relies on a separate mechanism that permanently shutdown NM once it is DECOMMISSIONED.
I never hear we need a separate mechanism to shutdown NM once it is decommissioned. It should
be built-in behavior for Apache Hadoop YARN so far. Are you talking about a private/specific
branch rather than current trunk/branch-2?

bq. As long as such DECOMMISSIONED node never try to register or be recommissioned, yes, I
expect these transitions you listed could be removed.
The re-register of node after taking refreshNode operation is going through the normal register
process which is good enough for me. I don't think we need some change here unless we have
strong reasons. So. Yes. Please remove these transitions because this is not correct based
on current YARN's logic.

bq. So I see these transitions are really needed. That said, I could removed them and maintain
them privately inside EMR branch for the sake of getting this JIRA going.
I can understand the pain point to maintain a private branch - may be standing at your private
(EMR) branch, these pieces of code could be needed. However, as a community contributor, you
have to switch your roles to stand at community code base in trunk/branch-2, and we committers
can only help to get in pieces of code that benefit the whole community. If these piece of
code can be important for another story (like resource elasticity of YARN) to benefit the
community, we can move it out to another dedicated work but we need to have open discussion
on design/implementation ahead - that's the right process for patch/feature contribution.

bq. These transitions are there almost single the beginning of this JIRA, any other comments/surprises?
These issues already make me surprised enough - these transitions in RMNode belongs to very
key logic to YARN, and we need to be careful as always. I need more time to review the rest
of code. Hopefully, I can finish my 1st round tomorrow and publish the left comments.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, GracefulDecommissionYarnNode.pdf,
YARN-4676.004.patch, YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch,
YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch,
YARN-4676.014.patch, YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, YARN-4676.018.patch,
YARN-4676.019.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager automatically
evaluates
> status of all affected nodes to kicks out decommission or recommission actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to decommission
the
> nodes immediately after there are ready to be decommissioned. Decommissioning timeout
at individual
> nodes granularity is supported and could be dynamically updated. The mechanism naturally
supports multiple
> independent graceful decommissioning “sessions” where each one involves different
sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful decommission
request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks DECOMMISSIONING nodes
status automatically and asynchronously after client/admin made the graceful decommission
request. It tracks DECOMMISSIONING nodes status to decide when, after all running containers
on the node have completed, will be transitioned into DECOMMISSIONED state. NodesListManager
detect and handle include and exclude list changes to kick out decommission or recommission
as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message