hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
Date Mon, 18 May 2015 09:48:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547784#comment-14547784
] 

Hadoop QA commented on YARN-3655:
---------------------------------

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 32s | Pre-patch trunk compilation is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any @author tags.
|
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to include 1 new
or modified test files. |
| {color:green}+1{color} | javac |   7m 29s | There were no new javac warning messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc warning messages.
|
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does not increase
the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 48s | There were no new checkstyle issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that end in whitespace.
|
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with eclipse:eclipse.
|
| {color:red}-1{color} | findbugs |   1m 18s | The patch appears to introduce 1 new Findbugs
(version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  60m 18s | Tests failed in hadoop-yarn-server-resourcemanager.
|
| | |  96m 33s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-resourcemanager |
|  |  Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.isHDFS;
locked 66% of time  Unsynchronized access at FileSystemRMStateStore.java:66% of time  Unsynchronized
access at FileSystemRMStateStore.java:[line 156] |
| Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
|
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | http://issues.apache.org/jira/secure/attachment/12733478/YARN-3655.001.patch
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a46506d |
| Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7964/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
|
| hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7964/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
|
| Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7964/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep
3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7964/console |


This message was automatically generated.

> FairScheduler: potential livelock due to maxAMShare limitation and container reservation

> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-3655
>                 URL: https://issues.apache.org/jira/browse/YARN-3655
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-3655.000.patch, YARN-3655.001.patch
>
>
> FairScheduler: potential livelock due to maxAMShare limitation and container reservation.
> If a node is reserved by an application, all the other applications don't have any chance
to assign a new container on this node, unless the application which reserves the node assigns
a new container on this node or releases the reserved container on this node.
> The problem is if an application tries to call assignReservedContainer and fail to get
a new container due to maxAMShare limitation, it will block all other applications to use
the nodes it reserves. If all other running applications can't release their AM containers
due to being blocked by these reserved containers. A livelock situation can happen.
> The following is the code at FSAppAttempt#assignContainer which can cause this potential
livelock.
> {code}
>     // Check the AM resource usage for the leaf queue
>     if (!isAmRunning() && !getUnmanagedAM()) {
>       List<ResourceRequest> ask = appSchedulingInfo.getAllResourceRequests();
>       if (ask.isEmpty() || !getQueue().canRunAppAM(
>           ask.get(0).getCapability())) {
>         if (LOG.isDebugEnabled()) {
>           LOG.debug("Skipping allocation because maxAMShare limit would " +
>               "be exceeded");
>         }
>         return Resources.none();
>       }
>     }
> {code}
> To fix this issue, we can unreserve the node if we can't allocate the AM container on
the node due to Max AM share limitation and the node is reserved by the application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message