hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chandni Singh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
Date Wed, 08 Aug 2018 02:54:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572593#comment-16572593

Chandni Singh commented on YARN-8160:

 Exit code 255 is coming from docker inspect container_e02_1533231998644_0009_01_000003. There
looks like a race condition where ContainerLaunch thread has issued the termination on docker
container pid. LinuxContainerExecutor still has a independent child process that is checking
the liveness of the docker container.
[~eyang], the container exit code comes from the below stmt in {{ContainerLaunch.call()}}
ret = launchContainer(new ContainerStartContext.Builder()

The docker inspect of the container that has been stopped and cleaned would just tell the
container is not alive. How does it affect the container's exit code? I cannot find this in
the code. Could you please point me to it?

I still think, below are the only 2 solutions for this:
1. In node manager, if a container is in REINITIALIZING_AWAITING_KILL and gets a CONTAINER_EXITED_WITH_FAILURE
event, then it should handle it in the similar way as it currently handle the CONTAINER_KILLED_ON_REQUEST.

2. cleanup of container files is not performed until the container exits

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> ----------------------------------------------------------------------------
>                 Key: YARN-8160
>                 URL: https://issues.apache.org/jira/browse/YARN-8160
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: Docker
>         Attachments: container_e02_1533231998644_0009_01_000003.nm.log
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. {{reInitializeContainer}}
does *NOT* change the ContainerId of the upgraded container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to upgrade the
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM is not creating
another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* require additional
resources IIUC.
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. Investigate
and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify {{reInitializeContainer}}
to trigger docker container launch without pulling the image first which could be based on
a flag.
>     -- When the service upgrade is initialized, we can provide the user with an option
to just pull the images  on the NMs.
>     -- When a component instance is upgrade, it calls the {{reInitializeContainer}} with
the flag pull-image set to false, since the NM will have already pulled the images.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message