hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haibo Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
Date Wed, 16 May 2018 01:15:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476682#comment-16476682

Haibo Chen commented on YARN-4599:

Thanks [~miklos.szegedi@cloudera.com] for updating the patch! The new documentation, how to
configure is very clear and easy to understand to me. A few follow-up comments.
{quote}You are mixing the parent with the child. The parent gets the child pid and the child
gets 0 after the fork() since the child can just call getpid(). It forks a child process gets
it's pid in the parent and adds that to a cgroup. Once the child notices that it is in the
cgroup it starts eating memory triggering an OOM
Reading the code once more, test_oom in oom_listener_main.c happens to have the same name
as oom_listener_test_main.c. The test_oom() function in oom_listener_main.c seems to be a
helper function that simulates OOM situation. Let's add more comments to it so that it is
clearly indicated.

The DefaultOOMHandler checks both physical and virtual memory regardless of whether we'd want
to just enforce physical or virtural memory. We can make a flag and checks one only dependent
on whether physical or virtual memory is enabled.

The  ContainerSignalContext.equals() is just for the unit test. We could either complete
it by including more fields, or we can use Argument Matcher in TestDefaultOOMHandler.java
as an alternative.

The way that the watchdog in the elastic controller can stop the main controller thread from
listening is not quite straightforward. How about we add a comment before the while(read())
loop to say read() returns -1 if the process is destroye?

> Set OOM control for memory cgroups
> ----------------------------------
>                 Key: YARN-4599
>                 URL: https://issues.apache.org/jira/browse/YARN-4599
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.9.0
>            Reporter: Karthik Kambatla
>            Assignee: Miklos Szegedi
>            Priority: Major
>              Labels: oct16-medium
>         Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, YARN-4599.001.patch,
YARN-4599.002.patch, YARN-4599.003.patch, YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch,
YARN-4599.007.patch, YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly set OOM control
so that containers are not killed as soon as they go over their usage. Today, one could set
the swappiness to control this, but clusters with swap turned off exist.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message