hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hung (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2
Date Fri, 07 Sep 2018 18:04:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607445#comment-16607445
] 

Jonathan Hung edited comment on YARN-8200 at 9/7/18 6:03 PM:
-------------------------------------------------------------

Build https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 timed
out:
{noformat}cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
2>&1
Elapsed:   2m 40s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
2>&1
Elapsed:  15m 20s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
2>&1
Elapsed:   4m 49s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
2>&1
Elapsed:  79m 41s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
2>&1
Elapsed:   3m 59s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
2>&1
Build timed out (after 500 minutes). Marking the build as aborted.
Build was aborted
Performing Post build task...
Match found for :. : True
Logical operation result is TRUE
Running script  : #!/bin/bash{noformat}

It appears the unit tests hang here: (https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt)
{noformat}[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hadoop-yarn-client
---
[INFO] Compiling 34 source files to /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes
[WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java:[311,6]
[deprecation] MiniYARNCluster(String,int,int,int,int,boolean) in MiniYARNCluster has been
deprecated
[WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestNMClientAsync.java:[453,16]
[deprecation] onIncreaseContainerResourceError(ContainerId,Throwable) in AbstractCallbackHandler
has been deprecated
[WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestNMClientAsync.java:[306,16]
[deprecation] onContainerResourceIncreased(ContainerId,Resource) in AbstractCallbackHandler
has been deprecated
[WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java:[205,62]
[unchecked] unchecked call to handle(T) as a member of the raw type EventHandler
[INFO] 
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-yarn-client ---
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.yarn.client.TestYarnApiClasses
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.553 s - in org.apache.hadoop.yarn.client.TestYarnApiClasses
[INFO] Running org.apache.hadoop.yarn.client.TestRMFailover{noformat}

Though this is similar to the HDFS unit tests hanging in HADOOP-15711/HDFS-12711, so I suspect
it's not related to the unit test itself.


was (Author: jhung):
Build https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 timed
out:
{noformat}cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
2>&1
Elapsed:   2m 40s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
2>&1
Elapsed:  15m 20s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
2>&1
Elapsed:   4m 49s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
2>&1
Elapsed:  79m 41s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
2>&1
Elapsed:   3m 59s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
/opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
-Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy
-Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
2>&1
Build timed out (after 500 minutes). Marking the build as aborted.
Build was aborted
Performing Post build task...
Match found for :. : True
Logical operation result is TRUE
Running script  : #!/bin/bash{noformat}

> Backport resource types/GPU features to branch-2
> ------------------------------------------------
>
>                 Key: YARN-8200
>                 URL: https://issues.apache.org/jira/browse/YARN-8200
>             Project: Hadoop YARN
>          Issue Type: Task
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>            Priority: Major
>         Attachments: YARN-8200-branch-2.001.patch, counter.scheduler.operation.allocate.csv.defaultResources,
counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support deep learning
workloads. However, our main production clusters are running older versions of branch-2
(2.7 in our case). To prevent supporting too many very different hadoop versions across multiple
clusters, we would like to backport the resource types/resource profiles feature to branch-2,
as well as the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in YARN-7069 based
on issues we uncovered, and the backport was fairly smooth. We also did a trial backport of
most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature branch and then
merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message