hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7939) Yarn Service Upgrade: add support to upgrade a component instance
Date Fri, 20 Apr 2018 21:23:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446382#comment-16446382
] 

Eric Yang commented on YARN-7939:
---------------------------------

By reverting YARN-7973, the error messages disappeared, and I see the container started a
new instance, and running.  However, existing instance is not shutdown.

AM's log doesn't show new container has been allocated, RM also doesn't show new container
is allocated.  I see this on the node:

{code}
hbase     8413  0.0  0.0  15060  1500 ?        Ss   17:45   0:00 /bin/bash -c sleep 90000
1>/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1524245796717_0002/container_1524245796717_0002_01_000004/stdout.txt
2>/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1524245796717_0002/container_1524245796717_0002_01_000004/stderr.txt

hbase     8435  0.0  0.0   7712   604 ?        S    17:45   0:00 sleep 90000
hbase     8820  0.0  0.0 115244  1460 ?        Ss   20:21   0:00 /bin/bash -c sleep 1200000
1>/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1524248642708_0001/container_1524248642708_0001_01_000002/stdout.txt
2>/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1524248642708_0001/container_1524248642708_0001_01_000002/stderr.txt

{code}

The current implementation AM is only being notified of changes after operation are done.  If
the change was not successful or something fail in the middle, then AM is stuck in a component
instance upgrade.  We might need a timer to measure from the point container is instructed
to perform upgrade, and wait for a timeout value.  If the stop and start does not come
back with reasonable timeframe, a new instance should be launched to replace the lost instance.  This
will avoid getting stuck in middle if node manager did not report back with successful state,
or node manager was lost during upgrade.  This can increase robustness of the upgrade
framework, and solve the problem that I encountered.

> Yarn Service Upgrade: add support to upgrade a component instance 
> ------------------------------------------------------------------
>
>                 Key: YARN-7939
>                 URL: https://issues.apache.org/jira/browse/YARN-7939
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>         Attachments: YARN-7939.001.patch, YARN-7939.002.patch, YARN-7939.003.patch, YARN-7939.004.patch,
YARN-7939.005.patch, YARN-7939.006.patch, YARN-7939.007.patch, YARN-7939.008.patch, serviceam.log
>
>
> Yarn core supports in-place upgrade of containers. A yarn service can leverage that to
provide in-place upgrade of component instances. Please see YARN-7512 for details.
> Will add support to upgrade a single component instance first and then iteratively add other
APIs and features.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message