hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliot West <tea...@gmail.com>
Subject Re: Synchronizing Hive metastores across clusters
Date Thu, 17 Dec 2015 17:17:18 GMT
Hi Mich,

In your scenario is there any coordination of data syncing on HDFS and
metadata in HCatalog? I.e. could a situation occur where the replicated
metastore shows a partition as 'present' yet the data that backs the
partition in HDFS has not yet arrived at the replica filesystem? I Imagine
one could avoid this by snapshotting the source metastore, then syncing
HDFS, and then finally shipping the snapshot to the replica(?).

Thanks - Elliot.

On 17 December 2015 at 16:57, Mich Talebzadeh <mich@peridale.co.uk> wrote:

> Sounds like one way replication of metastore. Depending on your metastore
> platform that could be achieved pretty easily.
>
>
>
> Mine is Oracle and I use Materialised View replication which is pretty
> good but no latest technology. Others would be GoldenGate or SAP
> replication server.
>
>
>
> HTH,
>
>
>
> Mich
>
>
>
> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
> *Sent:* 17 December 2015 16:47
> *To:* user@hive.apache.org
> *Subject:* RE: Synchronizing Hive metastores across clusters
>
>
>
> Are both clusters in active/active mode or the cloud based cluster is
> standby?
>
>
>
> *From:* Elliot West [mailto:teabot@gmail.com <teabot@gmail.com>]
> *Sent:* 17 December 2015 16:21
> *To:* user@hive.apache.org
> *Subject:* Synchronizing Hive metastores across clusters
>
>
>
> Hello,
>
>
>
> I'm thinking about the steps required to repeatedly push Hive datasets out
> from a traditional Hadoop cluster into a parallel cloud based cluster. This
> is not a one off, it needs to be a constantly running sync process. As new
> tables and partitions are added in one cluster, they need to be synced to
> the cloud cluster. Assuming for a moment that I have the HDFS data syncing
> working, I'm wondering what steps I need to take to reliably ship the
> HCatalog metadata across. I use HCatalog as the point of truth as to when
> when data is available and where it is located and so I think that metadata
> is a critical element to replicate in the cloud based cluster.
>
>
>
> Does anyone have any recommendations on how to achieve this in practice?
> One issue (of many I suspect) is that Hive appears to store table/partition
> locations internally with absolute, fully qualified URLs, therefore unless
> the target cloud cluster is similarly named and configured some path
> transformation step will be needed as part of the synchronisation process.
>
>
>
> I'd appreciate any suggestions, thoughts, or experiences related to this.
>
>
>
> Cheers - Elliot.
>
>
>
>
>

Mime
View raw message