hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4481) negative pending resource of queues lead to applications in accepted status inifnitly
Date Fri, 26 Feb 2016 18:59:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169550#comment-15169550
] 

Eric Payne commented on YARN-4481:
----------------------------------

Not sure if this is related, but we are also seeing similar results in 2.7 for reserved containers:
{noformat}
    "name" : "Hadoop:service=ResourceManager,name=QueueMetrics,q0=root,q1=bigmem",
...
    "ReservedMB" : -6553600,
    "ReservedVCores" : -8000,
    "ReservedContainers" : -800,
...
{noformat}

> negative pending resource of queues lead to applications in accepted status inifnitly
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-4481
>                 URL: https://issues.apache.org/jira/browse/YARN-4481
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.7.2
>            Reporter: gu-chi
>            Priority: Critical
>         Attachments: jmx.txt
>
>
> Met a scenario of negative pending resource with capacity scheduler, in jmx, it shows:
> {noformat}
>     "PendingMB" : -4096,
>     "PendingVCores" : -1,
>     "PendingContainers" : -1,
> {noformat}
> full jmx infomation attached.
> this is not just a jmx UI issue, the actual pending resource of queue is also negative
as I see the debug log of
> bq. DEBUG | ResourceManager Event Processor | Skip this queue=root, because it doesn't
need more resource, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY node-partition= | ParentQueue.java
> this lead to the {{NULL_ASSIGNMENT}}
> The background is submitting hundreds of applications and consume all cluster resource
and reservation happen. While running, network fault injected by some tool, injection types
are delay,jitter
> ,repeat,packet loss and disorder. And then kill most of the applications submitted.
> Anyone also facing negative pending resource, or have idea of how this happen?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message