hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Koifman <ekoif...@hortonworks.com>
Subject Re: Synchronizing Hive metastores across clusters
Date Thu, 17 Dec 2015 19:22:52 GMT
Metastore supports MetaStoreEventListener and MetaStorePreEventListener which may be useful
here

Eugene

From: Elliot West <teabot@gmail.com<mailto:teabot@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, December 17, 2015 at 8:21 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Synchronizing Hive metastores across clusters

Hello,

I'm thinking about the steps required to repeatedly push Hive datasets out from a traditional
Hadoop cluster into a parallel cloud based cluster. This is not a one off, it needs to be
a constantly running sync process. As new tables and partitions are added in one cluster,
they need to be synced to the cloud cluster. Assuming for a moment that I have the HDFS data
syncing working, I'm wondering what steps I need to take to reliably ship the HCatalog metadata
across. I use HCatalog as the point of truth as to when when data is available and where it
is located and so I think that metadata is a critical element to replicate in the cloud based
cluster.

Does anyone have any recommendations on how to achieve this in practice? One issue (of many
I suspect) is that Hive appears to store table/partition locations internally with absolute,
fully qualified URLs, therefore unless the target cloud cluster is similarly named and configured
some path transformation step will be needed as part of the synchronisation process.

I'd appreciate any suggestions, thoughts, or experiences related to this.

Cheers - Elliot.



Mime
View raw message