hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4284) Allow setting yarn.nodemanager.delete.debug-delay-sec on a per-job basis
Date Fri, 25 May 2012 12:24:23 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283335#comment-13283335

Arun C Murthy commented on MAPREDUCE-4284:


bq.Still, I would say this is a property to be use in development clusters.

If this is only needed for development clusters, then we just use the global setting and make
it very high (e.g. 3 days).

bq.  Or, in order to make it more production friendly there should be a MAX_TIME_TO_KEEP_FILES
property in the NM and jobs can set any value up to that time.

Then you pretty much have to have a limit on file-sizes, number of files etc. which leads
exactly to MAPREDUCE-1100, something which we've been trying to avoid by durably storing logs
in HDFS and not on the NM local disk.


To recap, if this is just for debugging, we can set the global limit very high and not bother
with per-job limits.

IAC, we have all task logs on HDFS - so I really don't see the need to reinvent MAPREDUCE-1100.


@Ahmed - Your proposal doesn't work because the NodeManager doesn't load jobConf of the container...
this would require changes to ContainerManager protocol.

> Allow setting yarn.nodemanager.delete.debug-delay-sec on a per-job basis
> ------------------------------------------------------------------------
>                 Key: MAPREDUCE-4284
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4284
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
> The yarn.nodemanager.delete.debug-delay-sec property is helpful in debugging jobs (inspecting
container logs/local dirs after the job finishes). Currently it is a nodemanager property
and changing it requires restarting the nodemanager. In a production cluster this can be a
real problem. It is better to have this property set on a per-job basis and not requiring
the restart of nodemanagers. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message