hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1332) Archiving partitions
Date Mon, 03 May 2010 17:35:57 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863416#action_12863416
] 

Namit Jain commented on HIVE-1332:
----------------------------------

DDLSemanticAnalyzer.java

    622 private void analyzeAlterTableArchive(CommonTree ast, boolean isUnArchive)
		623	throws SemanticException {
		624
		625	if (!conf.getBoolVar(HiveConf.ConfVars.HIVEARCHIVEENABLED)) {
		626	throw new SemanticException("Archiving methods are currently disabled. " +
		627	"Please see the Hive wiki for more information about enabling archiving.");
		628
		629	}
		630	String tblName = unescapeIdentifier(ast.getChild(0).getText());
		631	// partition name to value
		632	List<Map<String, String>> partSpecs = getPartitionSpecs(ast);
		633	if (partSpecs.size() > 1 ) {
		634	throw new SemanticException(isUnArchive ? "UNARCHIVE" : "ARCHIVE" +
		635	" can only be run on a single partition");
		636	}
		637	if (partSpecs.size() == 0) {
		638	throw new SemanticException("ARCHIVE can only be run on partitions");


Add the error messages in ErrorMsg.java, and add negative tests for all of them.



DDLTask.java
    413	 // Means user specified a table
		414	if (simpleDesc.getPartSpec() == null) {
		415	throw new HiveException("ARCHIVE is for partitions only");
		416	}

Shouldn't this be checked in DDLSemanticAnalyzer instead ?



Same as above:

    421	 if (tbl.getTableType() != TableType.MANAGED_TABLE)  {
		422	throw new HiveException("ARCHIVE can only be performed on managed tables");
		423	}


and:

    429	 if (isArchived(p)) {
		430	throw new HiveException("Specified partition is already archived");
		431	}


One check that seems to be missing:

if we have multilple partition columns, say ds and hr.

and if the user tries to archive just by specifying ds, should that be allowed ?
I dont think it will work - are you checking that ?




> Archiving partitions
> --------------------
>
>                 Key: HIVE-1332
>                 URL: https://issues.apache.org/jira/browse/HIVE-1332
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore
>    Affects Versions: 0.6.0
>            Reporter: Paul Yang
>            Assignee: Paul Yang
>         Attachments: HIVE-1332.1.patch
>
>
> Partitions and tables in Hive typically consist of many files on HDFS. An issue is that
as the number of files increase, there will be higher memory/load requirements on the namenode.
Partitions in bucketed tables are a particular problem because they consist of many files,
one for each of the buckets.
> One way to drastically reduce the number of files is to use hadoop archives:
> http://hadoop.apache.org/common/docs/current/hadoop_archives.html
> This feature would introduce an ALTER TABLE <table_name> ARCHIVE PARTITION <spec>
that would automatically put the files for the partition into a HAR file. We would also have
an UNARCHIVE option to convert the files in the partition back to the original files. Archived
partitions would be slower to access, but they would have the same functionality and decrease
the number of files drastically. Typically, only seldom accessed partitions would be archived.
> Hadoop archives are still somewhat new, so we'll only put in support for the latest released
major version (0.20). Here are some bug fixes:
> https://issues.apache.org/jira/browse/HADOOP-6591 (Important - could potentially cause
data loss without this fix)
> https://issues.apache.org/jira/browse/HADOOP-6645
> https://issues.apache.org/jira/browse/MAPREDUCE-1585

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message