hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Yang (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1332) Archiving partitions
Date Fri, 30 Apr 2010 18:47:03 GMT
Archiving partitions
--------------------

                 Key: HIVE-1332
                 URL: https://issues.apache.org/jira/browse/HIVE-1332
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Metastore
    Affects Versions: 0.6.0
            Reporter: Paul Yang
            Assignee: Paul Yang


Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as
the number of files increase, there will be higher memory/load requirements on the namenode.
Partitions in bucketed tables are a particular problem because they consist of many files,
one for each of the buckets.

One way to drastically reduce the number of files is to use hadoop archives:
http://hadoop.apache.org/common/docs/current/hadoop_archives.html

This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION <spec>
that would automatically put the files for the partition into a HAR file. We would also have
an UNARCHIVE option to convert the files in the partition back to the original files. Archived
partitions would be slower to access, but they would have the same functionality and decrease
the number of files drastically. Typically, only seldom accessed partitions would be archived.

Hadoop archives are still somewhat new, so we'll only put in support for the latest released
major version (0.20). Here are some bug fixes:

https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data
loss without this fix)
https://issues.apache.org/jira/browse/HADOOP-6645
https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message