hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@thoughtworks.com>
Subject Re: JobCache directory cleanup
Date Thu, 10 Jan 2013 04:18:30 GMT
Hi,

The directory name you have provided is
/data?/mapred/local/taskTracker/persona/jobcache/.
This directory is used by the TaskTracker (slave) daemons to localize job
files when the tasks are run on the slaves.

Hence, I don't think this is related to the parameter
"mapreduce.jobtracker.retiredjobs.cache.size",
which is a parameter related to the jobtracker process.

Now, when a job completes, the directories under the jobCache must get
automatically cleaned up. However it doesn't look like this is happening in
your case.

Could you please look at the logs of the tasktracker machine where this has
gotten filled up to see if there are any errors that could give clues ?
Also, since this is a CDH release, it could be a problem specific to that -
and maybe reaching out on the CDH mailing lists will also help

Thanks
hemanth

On Wed, Jan 9, 2013 at 8:11 PM, Ivan Tretyakov
<itretyakov@griddynamics.com>wrote:

> Hello!
>
> I've found that jobcache directory became very large on our cluster, e.g.:
>
> # du -sh /data?/mapred/local/taskTracker/user/jobcache
> 465G    /data1/mapred/local/taskTracker/user/jobcache
> 464G    /data2/mapred/local/taskTracker/user/jobcache
> 454G    /data3/mapred/local/taskTracker/user/jobcache
>
> And it stores information for about 100 jobs:
>
> # ls -1 /data?/mapred/local/taskTracker/persona/jobcache/  | sort | uniq |
> wc -l
>
> I've found that there is following parameter:
>
> <property>
>   <name>mapreduce.jobtracker.retiredjobs.cache.size</name>
>   <value>1000</value>
>   <description>The number of retired job status to keep in the cache.
>   </description>
> </property>
>
> So, if I got it right it intended to control job cache size by limiting
> number of jobs to store cache for.
>
> Also, I've seen that some hadoop users uses cron approach to cleanup
> jobcache:
> http://grokbase.com/t/hadoop/common-user/102ax9bze1/cleaning-jobcache-manually
>  (
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201002.mbox/%3C99484d561002100143s4404df98qead8f2cf687a76d0@mail.gmail.com%3E
> )
>
> Are there other approaches to control jobcache size?
> What is more correct way to do it?
>
> Thanks in advance!
>
> P.S. We are using CDH 4.1.1.
>
> --
> Best Regards
> Ivan Tretyakov
>
> Deployment Engineer
> Grid Dynamics
> +7 812 640 38 76
> Skype: ivan.tretyakov
> www.griddynamics.com
> itretyakov@griddynamics.com
>

Mime
View raw message