hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter D Kirchner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3020) n similar addContainerRequest()s produce n*(n+1)/2 containers
Date Tue, 13 Jan 2015 19:59:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275825#comment-14275825

Peter D Kirchner commented on YARN-3020:

I investigated the rates in the third paragraph of my comment immediately above, and found
that an application is able to make addContainerRequest()s much faster than this.  Bear in
mind that the elapsed time for making the client-api call to addContainerRequest() is not
a measurement of the performance impact of the reported over-requests sent to the server and
the resulting over-allocation of containers. It turns out my application has some extrinsic
delay in issuing addContainerRequests which predominated in limiting the rate I measured and
reported in the third paragraph of the comment immediately above.

To follow up, I measured addContainerRequest() timing with System.nanoTime().  The first call
to addContainerRequest() takes around 5 milliseconds.  The rest take around half a millisecond
on average.  Here are some statistics for calling addContainerRequest():  microseconds average=433
count=914 max=11202 min=223 .  I measure similar times for consecutive calls (without additional
application delays in between addContainerRequest()s).

When the over-request bug is fixed, I will still think it tedious to call 1000x for 1000 identical
containers but many applications can probably afford the half second to do so. Arguably, the
bug exists in part because of the tediousness of bookkeeping on the yarn-client-api side for
these requests.  If in the process of bug-fixing or cleanup, a change that re-introduces an
integer quantity with the request would be welcome.

> n similar addContainerRequest()s produce n*(n+1)/2 containers
> -------------------------------------------------------------
>                 Key: YARN-3020
>                 URL: https://issues.apache.org/jira/browse/YARN-3020
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
>            Reporter: Peter D Kirchner
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> BUG: If the application master calls addContainerRequest() n times, but with the same
priority, I get up to 1+2+3+...+n containers = n*(n+1)/2 .  The most containers are requested
when the interval between calls to addContainerRequest() exceeds the heartbeat interval of
calls to allocate() (in AMRMClientImpl's run() method).
> If the application master calls addContainerRequest() n times, but with a unique priority
each time, I get n containers (as I intended).
> Analysis:
> There is a logic problem in AMRMClientImpl.java.
> Although AMRMClientImpl.java, allocate() does an ask.clear() , on subsequent calls to
addContainerRequest(), addResourceRequest() finds the previous matching remoteRequest and
increments the container count rather than starting anew, and does an addResourceRequestToAsk()
which defeats the ask.clear().
> From documentation and code comments, it was hard for me to discern the intended behavior
of the API, but the inconsistency reported in this issue suggests one case or the other is
implemented incorrectly.

This message was sent by Atlassian JIRA

View raw message