hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Tretyakov <itretya...@griddynamics.com>
Subject Re: JobCache directory cleanup
Date Wed, 09 Jan 2013 15:22:27 GMT
Thanks a lot Alexander!

What is mapreduce.jobtracker.retiredjobs.cache.size for?
Does cron approach safe for hadoop? Is that only way at the moment?


On Wed, Jan 9, 2013 at 6:50 PM, Alexander Alten-Lorenz
<wget.null@gmail.com>wrote:

> Hi,
>
> Per default (and not configurable) the logs will be persist for 30 days.
> This will be configurable in future (
> https://issues.apache.org/jira/browse/MAPREDUCE-4643).
>
> - Alex
>
> On Jan 9, 2013, at 3:41 PM, Ivan Tretyakov <itretyakov@griddynamics.com>
> wrote:
>
> > Hello!
> >
> > I've found that jobcache directory became very large on our cluster,
> e.g.:
> >
> > # du -sh /data?/mapred/local/taskTracker/user/jobcache
> > 465G    /data1/mapred/local/taskTracker/user/jobcache
> > 464G    /data2/mapred/local/taskTracker/user/jobcache
> > 454G    /data3/mapred/local/taskTracker/user/jobcache
> >
> > And it stores information for about 100 jobs:
> >
> > # ls -1 /data?/mapred/local/taskTracker/persona/jobcache/  | sort | uniq
> |
> > wc -l
> >
> > I've found that there is following parameter:
> >
> > <property>
> >  <name>mapreduce.jobtracker.retiredjobs.cache.size</name>
> >  <value>1000</value>
> >  <description>The number of retired job status to keep in the cache.
> >  </description>
> > </property>
> >
> > So, if I got it right it intended to control job cache size by limiting
> > number of jobs to store cache for.
> >
> > Also, I've seen that some hadoop users uses cron approach to cleanup
> > jobcache:
> >
> http://grokbase.com/t/hadoop/common-user/102ax9bze1/cleaning-jobcache-manually
> > (
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201002.mbox/%3C99484d561002100143s4404df98qead8f2cf687a76d0@mail.gmail.com%3E
> > )
> >
> > Are there other approaches to control jobcache size?
> > What is more correct way to do it?
> >
> > Thanks in advance!
> >
> > P.S. We are using CDH 4.1.1.
> >
> > --
> > Best Regards
> > Ivan Tretyakov
> >
> > Deployment Engineer
> > Grid Dynamics
> > +7 812 640 38 76
> > Skype: ivan.tretyakov
> > www.griddynamics.com
> > itretyakov@griddynamics.com
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>


-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Mime
View raw message