hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Tretyakov <itretya...@griddynamics.com>
Subject JobCache directory cleanup
Date Wed, 09 Jan 2013 14:41:58 GMT
Hello!

I've found that jobcache directory became very large on our cluster, e.g.:

# du -sh /data?/mapred/local/taskTracker/user/jobcache
465G    /data1/mapred/local/taskTracker/user/jobcache
464G    /data2/mapred/local/taskTracker/user/jobcache
454G    /data3/mapred/local/taskTracker/user/jobcache

And it stores information for about 100 jobs:

# ls -1 /data?/mapred/local/taskTracker/persona/jobcache/  | sort | uniq |
wc -l

I've found that there is following parameter:

<property>
  <name>mapreduce.jobtracker.retiredjobs.cache.size</name>
  <value>1000</value>
  <description>The number of retired job status to keep in the cache.
  </description>
</property>

So, if I got it right it intended to control job cache size by limiting
number of jobs to store cache for.

Also, I've seen that some hadoop users uses cron approach to cleanup
jobcache:
http://grokbase.com/t/hadoop/common-user/102ax9bze1/cleaning-jobcache-manually
 (
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201002.mbox/%3C99484d561002100143s4404df98qead8f2cf687a76d0@mail.gmail.com%3E
)

Are there other approaches to control jobcache size?
What is more correct way to do it?

Thanks in advance!

P.S. We are using CDH 4.1.1.

-- 
Best Regards
Ivan Tretyakov

Deployment Engineer
Grid Dynamics
+7 812 640 38 76
Skype: ivan.tretyakov
www.griddynamics.com
itretyakov@griddynamics.com

Mime
View raw message