hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shane Kumpf (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens
Date Tue, 24 Apr 2018 12:47:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449773#comment-16449773

Shane Kumpf commented on YARN-2674:

I've spent some time looking into what issues are already opened for dshell tests and most
of the flaky tests are being tracked.

YARN-7771 - Intermittent failures of tests that leverage TestDistributedShell#testDSShell
YARN-8078 - TestDistributedShell#testDSShellWithoutDomainV2 fails on trunk
YARN-6479 - TestDistributedShell.testDSShellWithoutDomainV1_5 fails
YARN-4385 - TestDistributedShell times out
YARN-4350 - TestDistributedShell fails for V2 scenarios

With these known flaky tests commented out, I've still yet to get 20 successful runs of the
dshell tests. I'll continue to look into the tests as time permits, but I think we can move
forward with this patch in the meantime.

> Distributed shell AM may re-launch containers if RM work preserving restart happens
> -----------------------------------------------------------------------------------
>                 Key: YARN-2674
>                 URL: https://issues.apache.org/jira/browse/YARN-2674
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: applications, resourcemanager
>            Reporter: Chun Chen
>            Assignee: Shane Kumpf
>            Priority: Major
>              Labels: oct16-easy
>         Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, YARN-2674.4.patch,
YARN-2674.5.patch, YARN-2674.6.patch
> Currently, if RM work preserving restart happens while distributed shell is running,
distribute shell AM may re-launch all the containers, including new/running/complete. We must
make sure it won't re-launch the running/complete containers.
> We need to remove allocated containers from AMRMClientImpl#remoteRequestsTable once AM
receive them from RM. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message