hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sriramadasu <amar...@yahoo-inc.com>
Subject Re: intermediate files of killed tasks not purged
Date Tue, 28 Apr 2009 09:09:16 GMT
Again, where are you seeing the attemptid directories? are they at 
mapred/local/<attemptid> or at 
If you are seeing files at mapred/local/<attemptid>, then it is bug. 
Please raise a jira and attach tasktracker logs if possible.
If not, mapred/local/taskTracker/jobCache/<jobid>/<attempid> directories 
are cleaned up on a KillTaskAction and 
mapred/local/taskTracker/jobCache/<jobid> directories are cleanedup on 
KillJobAction. Can you verify from TaskTracker logs, the attemptid got a 
KillTaskAction or jobid got a KillJobAction? If not, This is fixed by 


Sandhya E wrote:
> Hi Amareshwari
> We are on 0.18 version. I verified from jobtracker website that not
> all killed tasks have left overs in mapred/local.  Also there are some
> tasks that were successful have left their tmp folders in mapred/local
> Can you please give some pointers on how to debug it further.
> Regards
> Sandhya
> On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
> <amarsri@yahoo-inc.com> wrote:
>> Hi Sandhya,
>>  Which version of HADOOP are you using? There could be <attempt_id>
>> directories in mapred/local, pre 0.17. Now, there should not be any such
>> directories.
>> From version 0.17 onwards, the attempt directories will be present only at
>> mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing
>> directories in any other location, then it seems like a bug.
>> HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
>> not change local FileSystem files.
>> Thanks
>> Amareshwari
>> Edward J. Yoon wrote:
>>> Hi,
>>> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>>> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sandhyabhaskar@gmail.com>
>>> wrote:
>>>> Hi
>>>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>>>> "attempt_200904262046_0026_m_000002_0"
>>>> Each of these directories contains files of format: intermediate.1
>>>> intermediate.2  intermediate.3  intermediate.4  intermediate.5
>>>> There are many directories in this format. All these correspond to
>>>> killed task attempts. As they contain huge intermediate files, we
>>>> landed up in disk space issues.
>>>> They are cleaned up  when mapred cluster is restarted. But otherwise,
>>>> how can these be cleaned up without having to restart cluster.
>>>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>>> Many Thanks
>>>> Sandhya

View raw message