Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B477811DB3 for ; Thu, 31 Jul 2014 05:41:40 +0000 (UTC) Received: (qmail 21314 invoked by uid 500); 31 Jul 2014 05:41:39 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 21089 invoked by uid 500); 31 Jul 2014 05:41:39 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 20869 invoked by uid 500); 31 Jul 2014 05:41:39 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 20813 invoked by uid 99); 31 Jul 2014 05:41:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jul 2014 05:41:39 +0000 Date: Thu, 31 Jul 2014 05:41:39 +0000 (UTC) From: "Mithun Radhakrishnan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-7341: --------------------------------------- Status: Patch Available (was: Open) > Support for Table replication across HCatalog instances > ------------------------------------------------------- > > Key: HIVE-7341 > URL: https://issues.apache.org/jira/browse/HIVE-7341 > Project: Hive > Issue Type: New Feature > Components: HCatalog > Affects Versions: 0.13.1 > Reporter: Mithun Radhakrishnan > Assignee: Mithun Radhakrishnan > Fix For: 0.14.0 > > Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch > > > The HCatClient currently doesn't provide very much support for replicating HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) instances. > Systems similar to Apache Falcon might find the need to replicate partition data between 2 clusters, and keep the HCatalog metadata in sync between the two. This poses a couple of problems: > # The definition of the source table might change (in column schema, I/O formats, record-formats, serde-parameters, etc.) The system will need a way to diff 2 tables and update the target-metastore with the changes. E.g. > {code} > targetTable.resolve( sourceTable, targetTable.diff(sourceTable) ); > hcatClient.updateTableSchema(dbName, tableName, targetTable); > {code} > # The current {{HCatClient.addPartitions()}} API requires that the partition's schema be derived from the table's schema, thereby requiring that the table-schema be resolved *before* partitions with the new schema are added to the table. This is problematic, because it introduces race conditions when 2 partitions with differing column-schemas (e.g. right after a schema change) are copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track of the partition's schema, in flight. > # The source and target metastores might be running different/incompatible versions of Hive. > The impending patch attempts to address these concerns (with some caveats). > # {{HCatTable}} now has > ## a {{diff()}} method, to compare against another HCatTable instance > ## a {{resolve(diff)}} method to copy over specified table-attributes from another HCatTable > ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in other class-loaders may be used for comparison > # {{HCatPartition}} now provides finer-grained control over a Partition's column-schema, StorageDescriptor settings, etc. This allows partitions to be copied completely from source, with the ability to override specific properties if required (e.g. location). > # {{HCatClient.updateTableSchema()}} can now update the entire table-definition, not just the column schema. > # I've cleaned up and removed most of the redundancy between the HCatTable, HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to separate the table-attributes from the add-table-operation's attributes. By providing fluent-interfaces in HCatTable, and composing an HCatTable instance in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are deprecated, in favour of those in HCatTable. Likewise, HCatPartition and HCatAddPartitionDesc. > I'll post a patch for trunk shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)