hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sriramadasu <amar...@yahoo-inc.com>
Subject Re: intermediate files of killed tasks not purged
Date Tue, 28 Apr 2009 09:09:16 GMT
Again, where are you seeing the attemptid directories? are they at 
mapred/local/<attemptid> or at 
mapred/local/taskTracker/jobCache/<jobid>/<attempid>.
If you are seeing files at mapred/local/<attemptid>, then it is bug. 
Please raise a jira and attach tasktracker logs if possible.
If not, mapred/local/taskTracker/jobCache/<jobid>/<attempid> directories 
are cleaned up on a KillTaskAction and 
mapred/local/taskTracker/jobCache/<jobid> directories are cleanedup on 
KillJobAction. Can you verify from TaskTracker logs, the attemptid got a 
KillTaskAction or jobid got a KillJobAction? If not, This is fixed by 
HADOOP-5247.

Thanks
Amareshwari

Sandhya E wrote:
> Hi Amareshwari
>
> We are on 0.18 version. I verified from jobtracker website that not
> all killed tasks have left overs in mapred/local.  Also there are some
> tasks that were successful have left their tmp folders in mapred/local
>
> Can you please give some pointers on how to debug it further.
>
> Regards
> Sandhya
>
> On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
> <amarsri@yahoo-inc.com> wrote:
>   
>> Hi Sandhya,
>>
>>  Which version of HADOOP are you using? There could be <attempt_id>
>> directories in mapred/local, pre 0.17. Now, there should not be any such
>> directories.
>> From version 0.17 onwards, the attempt directories will be present only at
>> mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing
the
>> directories in any other location, then it seems like a bug.
>>
>> HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
>> not change local FileSystem files.
>>
>> Thanks
>> Amareshwari
>> Edward J. Yoon wrote:
>>     
>>> Hi,
>>>
>>> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>>>
>>> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sandhyabhaskar@gmail.com>
>>> wrote:
>>>
>>>       
>>>> Hi
>>>>
>>>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>>>> "attempt_200904262046_0026_m_000002_0"
>>>> Each of these directories contains files of format: intermediate.1
>>>> intermediate.2  intermediate.3  intermediate.4  intermediate.5
>>>> There are many directories in this format. All these correspond to
>>>> killed task attempts. As they contain huge intermediate files, we
>>>> landed up in disk space issues.
>>>>
>>>> They are cleaned up  when mapred cluster is restarted. But otherwise,
>>>> how can these be cleaned up without having to restart cluster.
>>>>
>>>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>>>
>>>> Many Thanks
>>>> Sandhya
>>>>
>>>>
>>>>         
>>>
>>>
>>>       
>>     


Mime
View raw message