hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container
Date Wed, 24 Feb 2016 20:01:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163708#comment-15163708
] 

Bikas Saha commented on YARN-1040:
----------------------------------

I am not sure we need to place (somewhat artificial) constraints on the app when its not clear
that it practically affects YARN

1) Container with no process should be allowed. Apps could terminate all running tasks of
version A, then start running tasks of version B when they are not backwards compatible.
2) Container should be allowed to run multiple processes. This is similar to the existing
process spawning more processes. It is different from that in the sense that the NM has to
add the new process to existing monitoring/cgroups etc.
3) Startprocess should be allowed with no process actually started. This will allow apps to
localize new resources to an existing container. Alternatively, we could create a new localization
API thats delinked from starting the process. But re-localization is an important related
feature that we should look at supporting via this work because currently that does not work
since its tied to start process.
4) Most current apps are already communicating directly with their tasks and hence can shut
them down when they are not needed. However, like suggested above, it may be useful for the
NM to provide a feature whereby the previous task can be shutdown when a new task request
is received. Alternatively, the NM could provide a stopProcess API to make that explicit.

IMO all of this should be allowed. The timeline could be different with some being allowed
earlier and some later based on implementation effort.

Thinking ahead, it may be useful for the NM to accept a series of API calls within the same
RPC (with the current mechanism supported as a single command entity for backwards compatibility).
Then we will not have to build a lot of logic into the NM. The app can get all features by
composing a multi-command entity.
E.g.
Current start process = {acquire, localize, start} // where acquire means start container
Current shutdown process = {stop, release} // where release means give up container
Only localize = {localize}
Start another process = {localize, start}
Start another process after shutting down first process = {stop, start} or {stop, localize,
start}
Start another process and then shutdown the first process = {start, stop}
New container shutdown = {release} // at this point there may be 0 or more processes running
and which will be stopped


> De-link container life cycle from the process and add ability to execute multiple processes
in the same long-lived container
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the NM automatically
release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, which for HBase
would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that something
could be run in the container while a long-lived process was already running. This can be
useful in monitoring and reconfiguring the long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message