hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
Date Thu, 19 May 2016 13:45:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291103#comment-15291103
] 

Jason Lowe commented on YARN-1902:
----------------------------------

Sorry for jumping in late, but I'd like to keep moving this forward.  There are a significant
number of wasted container allocations as apps adjust their resource requests, and that adds
unnecessary load to the RM and reduces cluster efficiency.

IMHO without a protocol overhaul the fix has to come from the RM side if we want to minimize
the excess containers.  The inherent problem is that the AM is adjusting its resource request
_without_ the knowledge of what the RM has already allocated since the last heartbeat.  Therefore
if the RM sees an update to the AM ask during a heartbeat and that same heartbeat's response
has containers already allocated it needs to adjust the AM's ask by the containers that are
in that response.  For example:
# AM asks for 5 containers
# On a subsequent heartbeat the RM responds with 1 container
# On the next heartbeat the AM adjusts its ask to 4 containers, but the RM has already allocated
the remaining 4 containers from the original ask.
# The RM needs to interpret the new ask not as 4 more containers but as 0 containers since
4 of them are already satisfied in the current heartbeat's response.

If apps were well behaved, I think we could get most of the benefit by simply adjusting the
new total (ANY) ask update by the number of containers in the same heartbeat's response. 
It's true that an AM could get containers in that response that don't match its request, but
a well-behaved app should realize that any container received counts against the total (ANY)
resource request.  Therefore if the app throws away the container but still needs another
it must update at least the total container request to ask for a replacement.

> Allocation of too many containers when a second request is done with the same resource
capability
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1902
>                 URL: https://issues.apache.org/jira/browse/YARN-1902
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.2.0, 2.3.0, 2.4.0
>            Reporter: Sietse T. Au
>            Assignee: Sietse T. Au
>              Labels: client
>         Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch
>
>
> Regarding AMRMClientImpl
> Scenario 1:
> Given a ContainerRequest x with Resource y, when addContainerRequest is called z times
with x, allocate is called and at least one of the z allocated containers is started, then
if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1)
containers will be allocated, where 1 container is expected.
> Scenario 2:
> No containers are started between the allocate calls. 
> Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested
in both scenarios, but that only in the second scenario, the correct behavior is observed.
> Looking at the implementation I have found that this (z+1) request is caused by the structure
of the remoteRequestsTable. The consequence of Map<Resource, ResourceRequestInfo> is
that ResourceRequestInfo does not hold any information about whether a request has been sent
to the RM yet or not.
> There are workarounds for this, such as releasing the excess containers received.
> The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo
when a request has been successfully sent to the RM.
> The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message