hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmytro Kabakchei (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4698) Negative value in RM UI counters due to double container release
Date Wed, 17 Feb 2016 15:18:18 GMT

     [ https://issues.apache.org/jira/browse/YARN-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dmytro Kabakchei updated YARN-4698:
-----------------------------------
    Description: 
We noticed that on our cluster there are negative values in RM UI counters:
- Containers Running: -19
- Memory Used: -38GB
- Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

Some log records related can be found within "Example.log-cut" attachment.

After some investigation we made a conclusion that there is some kind of race condition for
container that was scheduled for killing, but was completed successfully before kill.
Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve original
problem (see mitigating2.5.1diff).
Unfortunately, the cluster and all other logs are lost, because the report was made about
a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other
versions.

  was:
We noticed that on our cluster there are negative values in RM UI counters:
-Containers Running: -19
-Memory Used: -38GB
-Vcores Used: -19

After we checked RM logs, we found, that the following events had happened:
- Assigned container: 67019 times
- Released container: 67019 times
- Invalid container released: 19 times

Some log records related can be found within "Example.log-cut" attachment.

After some investigation we made a conclusion that there is some kind of race condition for
container that was scheduled for killing, but was completed successfully before kill.
Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve original
problem (see mitigating2.5.1diff).
Unfortunately, the cluster and all other logs are lost, because the report was made about
a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other
versions.


> Negative value in RM UI counters due to double container release
> ----------------------------------------------------------------
>
>                 Key: YARN-4698
>                 URL: https://issues.apache.org/jira/browse/YARN-4698
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler, resourcemanager
>    Affects Versions: 2.5.1
>            Reporter: Dmytro Kabakchei
>            Priority: Minor
>         Attachments: Example.log-cut, mitigating2.5.1.diff
>
>
> We noticed that on our cluster there are negative values in RM UI counters:
> - Containers Running: -19
> - Memory Used: -38GB
> - Vcores Used: -19
> After we checked RM logs, we found, that the following events had happened:
> - Assigned container: 67019 times
> - Released container: 67019 times
> - Invalid container released: 19 times
> Some log records related can be found within "Example.log-cut" attachment.
> After some investigation we made a conclusion that there is some kind of race condition
for container that was scheduled for killing, but was completed successfully before kill.
> Also, there is a patch that possibly mitigates effects of the issue, but doesn't solve
original problem (see mitigating2.5.1diff).
> Unfortunately, the cluster and all other logs are lost, because the report was made about
a year ago, but wasn't submitted properly. Also, we don't know if the issue exist in other
versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message