hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "DiskSetup" by SteveLoughran
Date Wed, 31 Mar 2010 12:32:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "DiskSetup" page has been changed by SteveLoughran.
The comment on this change is: more on /tmp.
http://wiki.apache.org/hadoop/DiskSetup?action=diff&rev1=8&rev2=9

--------------------------------------------------

  
  === Do not keep stuff under /tmp ===
  
- Hadoop defaults to keeping things under `/tmp` so that you can play with Hadoop without
filling up your disk. This is dangerous in a production cluster, as any automated cleanup
cron job -you will need one- will eventually delete stuff in `/tmp`, at which point your Hadoop
cluster is in trouble. 
+  1. Hadoop defaults to keeping things under `/tmp` so that you can play with Hadoop without
filling up your disk. This is dangerous in a production cluster, as any automated cleanup
cron job will eventually delete stuff in `/tmp`, at which point your Hadoop cluster is in
trouble. 
+  1. You will need cron job to clean stuff in `/tmp` up eventually. Plan for it.
+  1. Configure Hadoop to store stuff in stable locations, preferably off that root disk.
+  1. Java stores the info for `jps` under `/tmp/hsperfdata_${user}` -after the cleanup jps
won't work. Have your script leave those directories alone, or get used to using `ps -ef |
grep java` to find Java processes instead. 
  
+ 
-  * Plan the disk layout, configure Hadoop to store stuff in stable locations, preferably
off that root disk.
-  
  == Underlying File System Options ==
  
  If mount the disks as `noatime`, then the file access times aren't written back; this speeds
up reads. There is also `relatime`, which stores some access time information, but is not
as slow as the classic atime attribute. Remember that any access time information kept by
Hadoop is independent of the atime attribute of individual blocks, so Hadoop does not care
what your settings are here. If you are mounting disks purely for Hadoop, use `noatime`.

Mime
View raw message