hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container
Date Wed, 24 Feb 2016 19:28:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163564#comment-15163564

Arun Suresh commented on YARN-1040:

Spent some time going thru the conversation (this one as well as YARN-1404)
Given that this has been tracked as a requirement for In place application upgrades and it
has been sometime since any activity has been posted here, [~bikassaha] / [~vinodkv] / [~hitesh]
/ [~tucu00] / [~steve_l], can you kindly clarify the following ?
# Are we still trying to handle the case where we have > 1 processes running against a
container *at the same time*
# Have we decided that allowing a Container with 0 processes running is a bad idea ?

>From the context of getting Application upgrades working, I guess 1) can be relaxed to
exactly 1 process running under a container but AM has the option of explicitly starting via
the {{startProcess(containerLaunchContext)}} API Bikas mentioned (an additional constraint
could probably be the startProcess has to be called within a timeout if no ContainerLaunchContext
has been provided with the initial {{startContainer()}} else NM will deem the container dead).

In addition, I was also thinking
# If a process is already running in the container when a {{startProcess(ContainerLaunchContext)}}
is received, then the first process is killed and another is started using the new {{ContainerLaunchContext}}
# Maybe we can refine the above by add an {{upgradeProcess(ContainerLaunchContext)}} API that
can additionally take on a policy like:
## auto-rollback if new process does not start within a timout.
## Rollback could either mean keeping the old process running until upgraded process is up
-or- if we want to preserve semantics of only 1 process per container, first kill the old
process and try to start new one, and on failure restart old version.

If everyone is ok with the above, I volunteer to either post a preliminary patch for the above
or if the details get dicier during investigation, I can put up a doc.

Thoughts ?  

> De-link container life cycle from the process and add ability to execute multiple processes
in the same long-lived container
> ----------------------------------------------------------------------------------------------------------------------------
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
> The AM should be able to exec >1 process in a container, rather than have the NM automatically
release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, which for HBase
would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that something
could be run in the container while a long-lived process was already running. This can be
useful in monitoring and reconfiguring the long-lived process, as well as shutting it down.

This message was sent by Atlassian JIRA

View raw message