hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/Archiving" by PaulYang
Date Tue, 02 Nov 2010 21:30:08 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/Archiving" page has been changed by PaulYang.
http://wiki.apache.org/hadoop/Hive/LanguageManual/Archiving?action=diff&rev1=5&rev2=6

--------------------------------------------------

  
  Due to the design of HDFS, the number of files in the filesystem directly affect the memory
consumption in the namenode. While normally not a problem for small clusters, memory usage
may hit the limits of accessible memory on a single machine when there are >50-100 million
files. In such situations, it is advantageous to have as few files as possible.
  
- The use of [[http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html | Hadoop
Archives]] is one approach to reducing the number of files in partitions. Hive has built-in
support that allows users to easily move files in existing partitions to a Hadoop Archive
(HAR) so that a partition that may once have consisted of 100's of files occupy ~3 files (depending
on settings) However, the trade off is that queries may be slower due to the additional overhead
in indirection.
+ The use of [[http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html | Hadoop
Archives]] is one approach to reducing the number of files in partitions. Hive has built-in
support to convert files in existing partitions to a Hadoop Archive (HAR) so that a partition
that may once have consisted of 100's of files can occupy just ~3 files (depending on settings)
However, the trade off is that queries may be slower due to the additional overhead in reading
from the HAR.
  
  Note that archiving does NOT compress the files - HAR is analogous to the unix tar command.
  

Mime
View raw message