hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "FAQ" by SomeOtherAccount
Date Mon, 13 Jun 2011 17:05:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "FAQ" page has been changed by SomeOtherAccount:
http://wiki.apache.org/hadoop/FAQ?action=diff&rev1=100&rev2=101

  Block replica files can be found on a DataNode in storage directories specified by configuration
parameter [[http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html#dfs.datanode.data.dir|dfs.datanode.data.dir]].
If the parameter is not set in the DataNode’s {{{hdfs-site.xml}}}, then the default location
{{{/tmp}}} will be used. This default is intended to be used only for testing. In a production
system this is an easy way to lose actual data, as local OS may enforce recycling policies
on {{{/tmp}}}. Thus the parameter must be overridden.<<BR>>
  If [[http://hadoop.apache.org/hdfs/docs/current/hdfs-default.html#dfs.datanode.data.dir|dfs.datanode.data.dir]]
correctly specifies storage directories on all !DataNodes, then you might have a real data
loss, which can be a result of faulty hardware or software bugs. If the file(s) containing
missing blocks represent transient data or can be recovered from an external source, then
the easiest way is to remove (and potentially restore) them. Run [[http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html#fsck|fsck]]
in order to determine which files have missing blocks. If you would like (highly appreciated)
to further investigate the cause of data loss, then you can dig into NameNode and DataNode
logs. From the logs one can track the entire life cycle of a particular block and its replicas.
  
+ == If a block size of 64MB is used and a file is written that uses less than 64MB, will
64MB of disk space be consumed? ==
+ 
+ Short answer: No.  
+ 
+ Longer answer:  Since HFDS does not do raw disk block storage, there are two block sizes
in use when writing a file in HDFS: the HDFS blocks size and the underlying file system's
block size.  HDFS will create files up to the size of the HDFS block size as well as a meta
file that contains CRC32 checksums for that block.  The underlying file system store that
file as increments of its block size on the actual raw disk, just as it would any other file.
  
  = Platform Specific =
  == Mac OS X ==

Mime
View raw message