hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "genericqa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-8382) cgroup file leak in NM
Date Sun, 03 Jun 2018 05:12:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499310#comment-16499310
] 

genericqa commented on YARN-8382:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 30s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} | {color:red}
The patch doesn't appear to include any new or modified tests. Please justify why no new tests
are needed for this patch. Also please list what manual steps were performed to verify this
patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m  9s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 55s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 24s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 35s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m  8s{color}
| {color:green} branch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 49s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 24s{color} |
{color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 32s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 50s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 50s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 18s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 29s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 20s{color}
| {color:green} patch has no errors when building and testing our client artifacts. {color}
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 55s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 21s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 43s{color} | {color:green}
hadoop-yarn-server-nodemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 23s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m 52s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8382 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12926265/YARN-8382.002.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  shadedclient
 findbugs  checkstyle  |
| uname | Linux 16da4ffcae7b 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a804b7c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20932/testReport/ |
| Max. process+thread count | 409 (vs. ulimit of 10000) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager |
| Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20932/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> cgroup file leak in NM
> ----------------------
>
>                 Key: YARN-8382
>                 URL: https://issues.apache.org/jira/browse/YARN-8382
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>         Environment: we write an container with a shutdownHook which has a piece of
code like  "while(true) sleep(100)" . when *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms <* *yarn.nodemanager.sleep-delay-before-sigkill.ms
, cgourp file leak happens; when* *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
>* ** *yarn.nodemanager.sleep-delay-before-sigkill.ms, cgroup file is deleted successfully***
>            Reporter: Hu Ziqian
>            Assignee: Hu Ziqian
>            Priority: Major
>         Attachments: YARN-8382-branch-2.8.3.001.patch, YARN-8382-branch-2.8.3.002.patch,
YARN-8382.001.patch, YARN-8382.002.patch
>
>
> As Jiandan said in YARN-6525, NM may delete  Cgroup container file timeout with logs
like below:
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to
delete cgroup at: /cgroup/cpu/hadoop-yarn/container_xxx, tried to delete for 1000ms
>  
> we found one situation is that when we set *yarn.nodemanager.sleep-delay-before-sigkill.ms*
bigger than *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms*, the cgroup
file leak happens *.* 
>  
> One container process tree looks like follow graph:
> bash(16097)───java(16099)─┬─\{java}(16100) 
>                                                   ├─\{java}(16101) 
> {{                       ├─\{java}(16102)}}
>  
> {{when NM kills a container, NM sends kill -15 -pid to kill container process group.
Bash process will exit when it received sigterm, but java process may do some job (shutdownHook
etc.), and doesn't exit unit receive sigkill. And when bash process exits, CgroupsLCEResourcesHandler
begin to try to delete cgroup files. So when *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* arrived,
the java processes may still running and cgourp/tasks still not empty and cause a cgroup file
leak.}}
>  
> {{we add a condition that *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* must
bigger than *yarn.nodemanager.sleep-delay-before-sigkill.ms* to solve this problem.}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message