hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed Radwan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4284) Allow setting yarn.nodemanager.delete.debug-delay-sec on a per-job basis
Date Thu, 24 May 2012 01:57:41 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282115#comment-13282115

Ahmed Radwan commented on MAPREDUCE-4284:

Just to elaborate more: The default for this property is 0, so these container dirs are directly
deleted when the job finishes. In a test cluster we can set the property to a relatively high
value to be able to inspect container logs/local dirs. But how can we do that in a production
cluster. The problem is that any change in this property will affect all jobs, and the change
will require restarting all the NodeManagers in the whole cluster. Both consequences are bad,
since keeping all dirs for all jobs is expensive from storage perspective and restarting the
NMs is expensive from operations perspective.

So one possible solution is to have the scope of this property as per-job (or add another
per-job property). so the user can set this value to give a hint to the NM to keep the dirs
for this individual job. We can still keep a NodeManager property to override or cap the delay

Arun, what do you think?
> Allow setting yarn.nodemanager.delete.debug-delay-sec on a per-job basis
> ------------------------------------------------------------------------
>                 Key: MAPREDUCE-4284
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4284
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
> The yarn.nodemanager.delete.debug-delay-sec property is helpful in debugging jobs (inspecting
container logs/local dirs after the job finishes). Currently it is a nodemanager property
and changing it requires restarting the nodemanager. In a production cluster this can be a
real problem. It is better to have this property set on a per-job basis and not requiring
the restart of nodemanagers. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message