hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-927) Cleanup of task-logs should happen in TaskTracker instead of the Child
Date Tue, 09 Feb 2010 04:03:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831285#action_12831285

Hemanth Yamijala commented on MAPREDUCE-927:

bq. This says user logs will be maintained for userLogRetainsHours duration after the job

It seems reasonable to change the semantics of userLogRetainsHours to start counting after
*job* completion, instead of after *task* completion. The basic use case for retaining logs
is for purposes such as debugging, log collection and analysis, etc. For all of these use
cases, it seems like having all the logs of the job is a more reasonable system behavior than
having just a few tasks' logs. The only catch we need to be aware of is that this will cause
an increase in the demand on disk space because overall logs will be retained for a longer
period of time. I doubt if this is a big concern though with the following assumptions:
- Mostly tasks of a job finish within reasonable amounts of time; long tails that will really
make a difference for log space used (hours instead of minutes for a few tasks) can be considered
- With the proposals of MAPREDUCE-1100, we have ways of controlling the amount of logs retained
and can use that to offset the increased demand for space.

bq. If it is a job level parameter, we would need a TaskTracker parameter as an upper bound
for the job configuration to control the disk space.

I would not take this approach. It is high time Configuration provides a mechanism to define
allowed ranges for configuration items with bounded values. For the time being, I would rather
take an approach where administrators can lock down the value of userLogRetainHours to a final
value if this is required.

> Cleanup of task-logs should happen in TaskTracker instead of the Child
> ----------------------------------------------------------------------
>                 Key: MAPREDUCE-927
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-927
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security, tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>            Priority: Blocker
>             Fix For: 0.21.0
> Task logs' cleanup is being done in Child now. This is undesirable atleast for two reasons:
1) failures while cleaning up will affect the user's tasks, and 2) the task's wall time will
get affected due to operations that TT actually should own.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message