hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7086) Release all containers aynchronously
Date Thu, 24 Aug 2017 16:51:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140295#comment-16140295
] 

Arun Suresh commented on YARN-7086:
-----------------------------------

Thanks for chiming in folks.
And yes, I agree with [~jlowe] too. To move forward, and if everyone if fine with the approach,
I will post a patch that does the following:
* Introduce a *RELEASE_CONTAINERS* scheduler event : will refactor the existing RELEASE_CONTAINER
event to take multiple containers.
* Will expose and aysnc release method in the AbstractYarnScheduler to takes a list of containers,
will split the list into some (configured ?) max containers released at a time, and will send
an event for each the sub-list.
* Route all calls to release containers from both the scheduler to the new API. Currently,
the problematic ones are during app attempt complete, node removed and the schedulers's handling
of AM's explicit release containers.

> Release all containers aynchronously
> ------------------------------------
>
>                 Key: YARN-7086
>                 URL: https://issues.apache.org/jira/browse/YARN-7086
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>
> We have noticed in production two situations that can cause deadlocks and cause scheduling
of new containers to come to a halt, especially with regard to applications that have a lot
of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the scheduler releases
all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make sure ALL
container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc [~leftnoteasy],
[~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the AbstractYarnScheduler
and a corresponding scheduler event, which is currently used specifically for the container-update
code paths (where the scheduler realeases temp containers which it creates for the update)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message