hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5620) Core changes in NodeManager to support for upgrade and rollback of Containers
Date Mon, 12 Sep 2016 07:36:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15483362#comment-15483362
] 

Arun Suresh edited comment on YARN-5620 at 9/12/16 7:35 AM:
------------------------------------------------------------

Updating patch.
* Addressing [~jianhe]'s latest comments
* some javadoc, checkstyle and javac fixes

bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and move to the
KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to the container at KILLING
state..
It goes to KILLING stage only if the AM explicitly sends a kill signal or the RM asks NM to
kill. It is also possible that the an admin logs into the NM and does a 'kill -9' which will
also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING
state.. right ?

bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade resource,
and verify the output is written into it, this verifies the part about the localization part
as well.
Actually if you look at the _prepareContainerUpgrade()_ function, we create a new script file
*scriptFile_new* which is passed into the _prepareContainerLaunchContext()_ function which
associates the new file to a new *dest_file_new* location.. this should verify that the upgrade
needed a new localized resource. The output of the script is also written to a new *start_file_n.txt*
which we read and verify to check if the new process has actually started.

Also by the way:

bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and so the getLocalPendingRequests
method and the new constructor in ContainerLocalizationRequestEvent is not needed
The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending
resources... So if you are ok with it, Id like to keep it as is..




was (Author: asuresh):
Updating patch.
* Addressing [~jianhe]'s latest comments
* some javadoc, checkstyle and javac fixes

bq. IIUC, in this case, the ContainerImpl will receive the KILL event first and move to the
KILLING state, and the CONTAINER_KILLED_ON_REQUEST will be sent to the container at KILLING
state..
It goes to KILLING stage only if the AM explicitly sends a kill signal or the RM asks NM to
kill. It is also possible that the an admin logs into the NM and does a 'kill -9' which will
also cause the ContainerLaunch to send CONTAINER_KILLED_ON_REQUEST but it wont be in KILLING
state.. right ?

bq. ..In testContainerUpgradeSuccess, could you make newStartFile a new upgrade resource,
and verify the output is written into it, this verifies the part about the localization part
as well.
Actually if you look at the _prepareContainerUpgrade()_ function, we create a new script file
*scriptFile_new* while passed into the _prepareContainerLaunchContext()_ function which associates
the new file to a new *dest_file_new* location.. this should verify that the upgrade needed
a new localized resource. The output of the script is also written to a new *start_file_n.txt*
which we read and verify to check if the new process has actually started.

Also by the way:

bq. We can use the ResourceSet#getAllResourcesByVisibility method instead, and so the getLocalPendingRequests
method and the new constructor in ContainerLocalizationRequestEvent is not needed
The problem with getAllResourcesByVisibility, is it gets all resources. I just need the pending
resources... So if you are ok with it, Id like to keep it as is..



> Core changes in NodeManager to support for upgrade and rollback of Containers
> -----------------------------------------------------------------------------
>
>                 Key: YARN-5620
>                 URL: https://issues.apache.org/jira/browse/YARN-5620
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-5620.001.patch, YARN-5620.002.patch, YARN-5620.003.patch, YARN-5620.004.patch,
YARN-5620.005.patch, YARN-5620.006.patch, YARN-5620.007.patch, YARN-5620.008.patch, YARN-5620.009.patch
>
>
> JIRA proposes to modify the ContainerManager (and other core classes) to support upgrade
of a running container with a new {{ContainerLaunchContext}} as well as the ability to rollback
the upgrade if the container is not able to restart using the new launch Context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message