hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter D Kirchner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3020) n similar addContainerRequest()s produce n*(n+1)/2 containers
Date Mon, 12 Jan 2015 19:12:36 GMT

    [ https://issues.apache.org/jira/browse/YARN-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273990#comment-14273990
] 

Peter D Kirchner commented on YARN-3020:
----------------------------------------

It looks like the bug may have come in with the code reorganization of r1494017 on 2013-06-18.
 I did not follow the log past this introduction of AMRMClient.java in its present form and
location.

In my code on my system (and I am supposing also in yours) each addContainerRequest() is taking
about a second even without a sleep.  The heartbeat I set in createAMRMClientAsync() was 1000
milliseconds (1 second), so I set it to 10 seconds to rule out that the addContainerRequest()
was somehow synchronous with allocate().  FWIW, for 10 containers requested, I got 17 containers
with a heartbeat of 10 seconds.  One heartbeat call to allocate() produced 7 containers, the
next call produced 10.  Each heartbeat on which the AMRMClient detects a change (in the number
of containers the AM has "add"ed) that needs to be sent to the RM, it sends the then-current
total not the diff.

Limiting the AM to ~1 container request per second is impractical, so the bug is potentially
initially helpful because the application does not have to wait 2 minutes to assemble 100
containers, all it needs to do is call addContainerRequest() about 15 times, taking about
15 seconds with a 1 second heartbeat.  The addContainerRequest() performance will need to
be improved, or the limitation of 1 container per addContainerRequest() introduced in r1503960
2013-07-16 will need to be reversed.

But by the time one naively requests 100 containers, and get 5,050, The bug is probably hurting
application and cluster performance.  Maybe a lot.

> n similar addContainerRequest()s produce n*(n+1)/2 containers
> -------------------------------------------------------------
>
>                 Key: YARN-3020
>                 URL: https://issues.apache.org/jira/browse/YARN-3020
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
>            Reporter: Peter D Kirchner
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> BUG: If the application master calls addContainerRequest() n times, but with the same
priority, I get up to 1+2+3+...+n containers = n*(n+1)/2 .  The most containers are requested
when the interval between calls to addContainerRequest() exceeds the heartbeat interval of
calls to allocate() (in AMRMClientImpl's run() method).
> If the application master calls addContainerRequest() n times, but with a unique priority
each time, I get n containers (as I intended).
> Analysis:
> There is a logic problem in AMRMClientImpl.java.
> Although AMRMClientImpl.java, allocate() does an ask.clear() , on subsequent calls to
addContainerRequest(), addResourceRequest() finds the previous matching remoteRequest and
increments the container count rather than starting anew, and does an addResourceRequestToAsk()
which defeats the ask.clear().
> From documentation and code comments, it was hard for me to discern the intended behavior
of the API, but the inconsistency reported in this issue suggests one case or the other is
implemented incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message