hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4284) Allow setting yarn.nodemanager.delete.debug-delay-sec on a per-job basis
Date Fri, 25 May 2012 12:24:23 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283335#comment-13283335
] 

Arun C Murthy commented on MAPREDUCE-4284:
------------------------------------------

@Tucu:

bq.Still, I would say this is a property to be use in development clusters.

If this is only needed for development clusters, then we just use the global setting and make
it very high (e.g. 3 days).

bq.  Or, in order to make it more production friendly there should be a MAX_TIME_TO_KEEP_FILES
property in the NM and jobs can set any value up to that time.

Then you pretty much have to have a limit on file-sizes, number of files etc. which leads
exactly to MAPREDUCE-1100, something which we've been trying to avoid by durably storing logs
in HDFS and not on the NM local disk.

----

To recap, if this is just for debugging, we can set the global limit very high and not bother
with per-job limits.

IAC, we have all task logs on HDFS - so I really don't see the need to reinvent MAPREDUCE-1100.

----

@Ahmed - Your proposal doesn't work because the NodeManager doesn't load jobConf of the container...
this would require changes to ContainerManager protocol.

                
> Allow setting yarn.nodemanager.delete.debug-delay-sec on a per-job basis
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4284
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4284
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>
> The yarn.nodemanager.delete.debug-delay-sec property is helpful in debugging jobs (inspecting
container logs/local dirs after the job finishes). Currently it is a nodemanager property
and changing it requires restarting the nodemanager. In a production cluster this can be a
real problem. It is better to have this property set on a per-job basis and not requiring
the restart of nodemanagers. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message