Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@minotaur.apache.org Received: (qmail 80870 invoked from network); 3 May 2010 18:46:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 18:46:19 -0000 Received: (qmail 76377 invoked by uid 500); 3 May 2010 18:46:19 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 76342 invoked by uid 500); 3 May 2010 18:46:19 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 76334 invoked by uid 99); 3 May 2010 18:46:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 18:46:19 +0000 X-ASF-Spam-Status: No, hits=-1381.2 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 18:46:18 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o43IjwOD000153 for ; Mon, 3 May 2010 18:45:58 GMT Message-ID: <32809004.19321272912358387.JavaMail.jira@thor> Date: Mon, 3 May 2010 14:45:58 -0400 (EDT) From: "Namit Jain (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Commented: (HIVE-1332) Archiving partitions In-Reply-To: <31144640.38141272653223302.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863465#action_12863465 ] Namit Jain commented on HIVE-1332: ---------------------------------- The same problem is present during unarchive. Once the existence of a har file have been checked, 2 processes running concurrently can create files. So, we may duplicate the data > Archiving partitions > -------------------- > > Key: HIVE-1332 > URL: https://issues.apache.org/jira/browse/HIVE-1332 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore > Affects Versions: 0.6.0 > Reporter: Paul Yang > Assignee: Paul Yang > Attachments: HIVE-1332.1.patch > > > Partitions and tables in Hive typically consist of many files on HDFS. An issue is that as the number of files increase, there will be higher memory/load requirements on the namenode. Partitions in bucketed tables are a particular problem because they consist of many files, one for each of the buckets. > One way to drastically reduce the number of files is to use hadoop archives: > http://hadoop.apache.org/common/docs/current/hadoop_archives.html > This feature would introduce an ALTER TABLE ARCHIVE PARTITION that would automatically put the files for the partition into a HAR file. We would also have an UNARCHIVE option to convert the files in the partition back to the original files. Archived partitions would be slower to access, but they would have the same functionality and decrease the number of files drastically. Typically, only seldom accessed partitions would be archived. > Hadoop archives are still somewhat new, so we'll only put in support for the latest released major version (0.20). Here are some bug fixes: > https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause data loss without this fix) > https://issues.apache.org/jira/browse/HADOOP-6645 > https://issues.apache.org/jira/browse/MAPREDUCE-1585 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.