hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "genericqa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7790) Improve Capacity Scheduler Async Scheduling to better handle node failures
Date Wed, 24 Jan 2018 04:09:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336866#comment-16336866
] 

genericqa commented on YARN-7790:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 10s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 2 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 38s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 37s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 28s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 40s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 59s{color}
| {color:green} branch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  4s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 24s{color} |
{color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 39s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 35s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 35s{color} | {color:green}
the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  0m 25s{color}
| {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
The patch generated 6 new + 118 unchanged - 1 fixed = 124 total (was 119) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 38s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m  7s{color}
| {color:green} patch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 11s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 23s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 44s{color} | {color:red}
hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 23s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}107m 51s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
|
|   | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7790 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12907413/YARN-7790.002.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  shadedclient
 findbugs  checkstyle  |
| uname | Linux 27d04b7b4e3b 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e307edc |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/19423/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
|
| unit | https://builds.apache.org/job/PreCommit-YARN-Build/19423/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/19423/testReport/ |
| Max. process+thread count | 885 (vs. ulimit of 5000) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager |
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/19423/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve Capacity Scheduler Async Scheduling to better handle node failures
> --------------------------------------------------------------------------
>
>                 Key: YARN-7790
>                 URL: https://issues.apache.org/jira/browse/YARN-7790
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Sumana Sathish
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-7790.001.patch, YARN-7790.002.patch
>
>
> This is not a new issue but async scheduling makes it worse:
> In sync scheduling, if an AM container allocated to a node, it assumes node just heartbeat
to RM, and AM launcher will connect NM to launch the container. Even though it is possible
that NM crashes after the heartbeat, which causes AM hangs for a while. But it is related
rare.
> In async scheduling world, multiple AM containers can be placed on a problematic NM,
which could cause application hangs easily. Discussed with [~sunilg] and [~jianhe] , we need
one fix:
> When async scheduling enabled:
>  - Skip node which missed X node heartbeat.
> And in addition, it's better to reduce wait time by setting following configs to earlier
fail a container being launched at an NM with connectivity issue.
> {code:java}
> RetryPolicy retryPolicy =
>     createRetryPolicy(conf,
>       YarnConfiguration.CLIENT_NM_CONNECT_MAX_WAIT_MS,
>       YarnConfiguration.DEFAULT_CLIENT_NM_CONNECT_MAX_WAIT_MS,
>       YarnConfiguration.CLIENT_NM_CONNECT_RETRY_INTERVAL_MS,
>       YarnConfiguration.DEFAULT_CLIENT_NM_CONNECT_RETRY_INTERVAL_MS);
> {code}
> The second part is not covered by the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message