hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/Archiving" by PaulYang
Date Tue, 02 Nov 2010 21:25:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/Archiving" page has been changed by PaulYang.
http://wiki.apache.org/hadoop/Hive/LanguageManual/Archiving?action=diff&rev1=4&rev2=5

--------------------------------------------------

  
  == Overview ==
  
- Due to the design of HDFS, the number of files in the filesystem directly affect the memory
consumption in the namenode. While normally not a problem for small clusters, memory usage
may hit the limits of accessible memory on a single machine when there are >50-100 million
files. Consequently, it is advantageous to have as few files as possible.
+ Due to the design of HDFS, the number of files in the filesystem directly affect the memory
consumption in the namenode. While normally not a problem for small clusters, memory usage
may hit the limits of accessible memory on a single machine when there are >50-100 million
files. In such situations, it is advantageous to have as few files as possible.
  
- The use of [[http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html | Hadoop
Archives]] is one approach to reducing the number of files in a partition. Hive has built-in
support that allows users to easily move files in existing partitions to a Hadoop Archive
(HAR) file so that a partition that may once have consisted of 100's of files occupy ~3 files
(depending on settings) However, the trade off is that queries may be slower due to the additional
overhead in indirection.
+ The use of [[http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html | Hadoop
Archives]] is one approach to reducing the number of files in partitions. Hive has built-in
support that allows users to easily move files in existing partitions to a Hadoop Archive
(HAR) so that a partition that may once have consisted of 100's of files occupy ~3 files (depending
on settings) However, the trade off is that queries may be slower due to the additional overhead
in indirection.
  
  Note that archiving does NOT compress the files - HAR is analogous to the unix tar command.
  

Mime
View raw message