hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Tretyakov <itretya...@griddynamics.com>
Subject Re: JobCache directory cleanup
Date Thu, 10 Jan 2013 11:47:38 GMT
Thanks for replies!

Hemanth,
I could see following exception in TaskTracker log:
https://issues.apache.org/jira/browse/MAPREDUCE-5
But I'm not sure if it is related to this issue.

> Now, when a job completes, the directories under the jobCache must get
automatically cleaned up. However it doesn't look like this is happening in
your case.

So, If I've no running jobs, jobcache directory should be empty. Is it
correct?



On Thu, Jan 10, 2013 at 8:18 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> The directory name you have provided is /data?/mapred/local/taskTracker/persona/jobcache/.
> This directory is used by the TaskTracker (slave) daemons to localize job
> files when the tasks are run on the slaves.
>
> Hence, I don't think this is related to the parameter "
> mapreduce.jobtracker.retiredjobs.cache.size", which is a parameter
> related to the jobtracker process.
>
> Now, when a job completes, the directories under the jobCache must get
> automatically cleaned up. However it doesn't look like this is happening in
> your case.
>
> Could you please look at the logs of the tasktracker machine where this
> has gotten filled up to see if there are any errors that could give clues ?
> Also, since this is a CDH release, it could be a problem specific to that
> - and maybe reaching out on the CDH mailing lists will also help
>
> Thanks
> hemanth
>
> On Wed, Jan 9, 2013 at 8:11 PM, Ivan Tretyakov <
> itretyakov@griddynamics.com> wrote:
>
>> Hello!
>>
>> I've found that jobcache directory became very large on our cluster, e.g.:
>>
>> # du -sh /data?/mapred/local/taskTracker/user/jobcache
>> 465G    /data1/mapred/local/taskTracker/user/jobcache
>> 464G    /data2/mapred/local/taskTracker/user/jobcache
>> 454G    /data3/mapred/local/taskTracker/user/jobcache
>>
>> And it stores information for about 100 jobs:
>>
>> # ls -1 /data?/mapred/local/taskTracker/persona/jobcache/  | sort | uniq
>> | wc -l
>>
>> I've found that there is following parameter:
>>
>> <property>
>>   <name>mapreduce.jobtracker.retiredjobs.cache.size</name>
>>   <value>1000</value>
>>   <description>The number of retired job status to keep in the cache.
>>   </description>
>> </property>
>>
>> So, if I got it right it intended to control job cache size by limiting
>> number of jobs to store cache for.
>>
>> Also, I've seen that some hadoop users uses cron approach to cleanup
>> jobcache:
>> http://grokbase.com/t/hadoop/common-user/102ax9bze1/cleaning-jobcache-manually
>>  (
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201002.mbox/%3C99484d561002100143s4404df98qead8f2cf687a76d0@mail.gmail.com%3E
>> )
>>
>> Are there other approaches to control jobcache size?
>> What is more correct way to do it?
>>
>> Thanks in advance!
>>
>> P.S. We are using CDH 4.1.1.
>>
>> --
>> Best Regards
>> Ivan Tretyakov
>>
>> Deployment Engineer
>> Grid Dynamics
>> +7 812 640 38 76
>> Skype: ivan.tretyakov
>> www.griddynamics.com
>> itretyakov@griddynamics.com
>>
>
>


-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Mime
View raw message