hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jun Gong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4536) DelayedProcessKiller may not work under heavy workload
Date Tue, 05 Jan 2016 01:24:39 GMT

    [ https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082168#comment-15082168
] 

Jun Gong commented on YARN-4536:
--------------------------------

[~gu chi] Thanks for explaining it. Yes, we also came across the problem, and have applied
the patch in YARN-4459, it works well now. I explained more in that issue's comments. Maybe
you could help review and try it. Thanks.

> DelayedProcessKiller may not work under heavy workload
> ------------------------------------------------------
>
>                 Key: YARN-4536
>                 URL: https://issues.apache.org/jira/browse/YARN-4536
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: gu-chi
>
> I am now facing with orphan process of container. Here is the scenario:
> With heavy task load, the NM machine CPU usage can reach almost 100%. When some container
got event of kill, it will get  {{SIGTERM}} , and then the parent process exit, leave the
container process to OS. This container process need handle some shutdown events or some logic,
but hardly can get CPU, we suppose to see a {{SIGKILL}} as there is {{DelayedProcessKiller}}
,but the parent process which persisted as container pid no longer exist, so the kill command
can not reach the container process. This is how orphan container process come.
> The orphan process do exit after some time, but the period can be very long, and will
make the OS status worse. As I observed, the period can be several hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message