hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "DiskSetup" by SteveLoughran
Date Mon, 29 Mar 2010 15:26:24 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "DiskSetup" page has been changed by SteveLoughran.
The comment on this change is: stuff on temp directories and logging. .
http://wiki.apache.org/hadoop/DiskSetup?action=diff&rev1=5&rev2=6

--------------------------------------------------

  
  == Configuring Hadoop ==
  
- Pass a list of disks to the dfs.data.dir parameter, Hadoop will use all of the disk that
are available.
+ Pass a list of disks to the `dfs.data.dir` parameter, Hadoop will use all of the disks that
are available. When one goes offline it is taken out of consideration. Hadoop does not check
for the disk coming back -it assumes it is "gone". 
  
+ === Logging ===
+ 
+  * Don't log to the root directory, as having a machine that does not boot because the logs
are overflowing can be inconvenient.
+  * Have a plan to clean up log output, otherwise jobs that log too much to the console will
fill up log directories.
+  * Get your developers to use the commons-logging APIs in their MapReduce code, so that
you can turn logging up or down without recompiling the code. They can run in debug mode on
their test machines, you can run at WARN level in production.
+  * Some JVMs (JRockit) seem to log more. Tune your Log4j settings for your JVM, and only
capture the stuff you really want. 
+ 
+ === Do not keep stuff under /tmp ===
+ 
+ Hadoop defaults to keeping things under `/tmp` so that you can play with Hadoop without
filling up your disk. This is dangerous in a production cluster, as any automated cleanup
cron job -you will need one- will eventually delete stuff in `/tmp`, at which point your Hadoop
cluster is in trouble. 
+ 
+  * Plan the disk layout, configure Hadoop to store stuff in stable locations, preferably
off that root disk.
+  
  == Underlying File System Options ==
  
- If mount the disks as noatime, then the file access times aren't written back; this speeds
up reads. There is also relatime, which stores some access time information, but is not as
slow as the classic atime attribute. Remember that any access time information kept by Hadoop
is independent of the atime attribute of individual blocks, so Hadoop does not care what your
settings are here. If you are mounting disks purely for hadoop, use noatime.
+ If mount the disks as `noatime`, then the file access times aren't written back; this speeds
up reads. There is also `relatime`, which stores some access time information, but is not
as slow as the classic atime attribute. Remember that any access time information kept by
Hadoop is independent of the atime attribute of individual blocks, so Hadoop does not care
what your settings are here. If you are mounting disks purely for Hadoop, use `noatime`.
  
- Formatting and tuning options are important. Using tunefs to set the reserve to zero percent
can save you over 25 GigaBytes on a 1 TeraByte disk. Also the underlying file system is going
to have many large files, you can get more space by lowering the number of inodes at format
time.
+ Formatting and tuning options are important. Using `tunefs` to set the reserve to zero percent
can save you over 25 GigaBytes on a 1 TeraByte disk. Also the underlying file system is going
to have many large files, you can get more space by lowering the number of inodes at format
time.
+ 
  === Ext3 ===
  
- Yahoo! has publicly stated they use ext3. Regardless of the merits of the filesystem, that
means that HDFS-on-ext3 has been publicly tested at a bigger scale than any other underlying
filesystem.
+ Yahoo! has publicly stated they use ext3. Regardless of the merits of the filesystem, that
means that HDFS-on-ext3 has been publicly tested at a bigger scale than any other underlying
filesystem that we know of.
  
  
  === XFS ===

Mime
View raw message