hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5987) NM configured command to collect heap dump of preempted container
Date Tue, 03 Jan 2017 18:37:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795781#comment-15795781

Miklos Szegedi commented on YARN-5987:

Thank you [~templedf] for the info! YARN-2261 is about adding a cleanup container for an entire
application, this jira is about adding a cleanup script for every container.
I read the design there and it suggests two points to discuss to me. One is whether we want
to run the cleanup callback in a container and the other is whether we want to do retries.
1. If we used a separate container for the callback, it might fail due to resource constraints,
which would prevent collecting a useful dump file. It has to run while the original container
is alive. One container accessing another would also raise container isolation concerns I
2. If we do not run the callback in a container, it cannot be preempted, so I think we do
not need any retry logic either. We can reconsider this, if there is a usage pattern other
than collecting a dump in the future. However, the callback itself can implement a retry logic
in the current implementation, if necessary.

> NM configured command to collect heap dump of preempted container
> -----------------------------------------------------------------
>                 Key: YARN-5987
>                 URL: https://issues.apache.org/jira/browse/YARN-5987
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>         Attachments: YARN-5987.000.patch, YARN-5987.001.patch
> The node manager can kill a container, if it exceeds the assigned memory limits. It would
be nice to have a configuration entry to set up a command that can collect additional debug
information, if needed. The collected information can be used for root cause analysis.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message