hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3998) Add support in the NodeManager to re-launch containers
Date Mon, 22 Aug 2016 14:08:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430822#comment-15430822
] 

Jason Lowe commented on YARN-3998:
----------------------------------

I believe the old software will ignore unrecognized keys in the state store, so we may be
OK with a rolling downgrade as long as the resulting behavior is expected.  This feature adds
the ability for the NM to re-launch containers, so if we downgrade we lose that ability. 
That means that containers will just fail, but I think we're OK there.  We lose the optimization
for a faster restart of the container, but AFAIK we don't outright lose containers like we
could with the feature added in YARN-5049.  One issue is that I'm not sure the old software
will clean out the new keys when a container completes, so we might leak some keys in the
state store during a rolling downgrade.

We could bump the minor version although it's unused.  The minor version carries no meaning
currently, although it could potentially provide clues to code that needs to do a schema migration
across major versions.  In practice I suspect it still won't be used since any such migration
will take into account all the keys it knows about, and at that point it doesn't need to look
at the minor version.  I can't think of a case where we would need the minor version info
to properly implement the migration cases rather than have the migration code auto-detect
based on what keys it finds in the store.

It doesn't look to me that this needs to be flagged as an incompatible change unless I'm missing
something with the semantics of the container re-launch.

> Add support in the NodeManager to re-launch containers
> ------------------------------------------------------
>
>                 Key: YARN-3998
>                 URL: https://issues.apache.org/jira/browse/YARN-3998
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>             Fix For: 2.9.0
>
>         Attachments: YARN-3998.01.patch, YARN-3998.02.patch, YARN-3998.03.patch, YARN-3998.04.patch,
YARN-3998.05.patch, YARN-3998.06.patch, YARN-3998.07.patch, YARN-3998.08.patch, YARN-3998.09.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM launches containers,
it could specify the value. Then NM will re-launch the container 'retry-times' times when
it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not need to re-schedule
the container. And local files in container's working directory will be left for re-use.(If
container have downloaded some big files, it does not need to re-download them when running
again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message