hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4654) remove temporary output directory of failed tasks
Date Tue, 25 Nov 2008 06:41:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650477#action_12650477
] 

Amareshwari Sriramadasu commented on HADOOP-4654:
-------------------------------------------------

The problem is because of cleaning up the temporary output directory of failed tasks done
at the end of job.

Till 0.18.X, the task commit is done by TaskCommitThread in JT. We can have functionality
added here if needed for 0.18. 

After HADOOP-3150, it is exposed to the user. Now, each task does commit at the end of it's
execution. To remove temporary output directory of failed/killed tasks as soon as they fail,
we should consider the following:
1. Failure/Kill can be anywhere between 'launching the task' to 'commiting the task'
2. Failure/Kill can be because of KillTaskAction or Exception/Error 

Owen's suggestion on HADOOP-3150 at http://issues.apache.org/jira/browse/HADOOP-3150?focusedCommentId=12626372#action_12626372
and http://issues.apache.org/jira/browse/HADOOP-3150?focusedCommentId=12628736#action_12628736
_to have task commit as separate task_ looks like the right approach here.
For successful tasks, a commit task will be launched for tha task commit. For failed/killed
tasks, an abort task will be launched for the task cleanup.
Thoughts?

> remove temporary output directory of failed tasks
> -------------------------------------------------
>
>                 Key: HADOOP-4654
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4654
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2, 0.18.1
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>
> When dfs is getting full (80+% of reserved space), the rate of write failures increases,
such that more map-reduce tasks can fail. By not cleaning up the temporary output directory
of tasks the situation worsens over the lifetime of a job, increasing the probability of the
whole job failing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message