Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC85C18B5B for ; Thu, 17 Dec 2015 16:47:25 +0000 (UTC) Received: (qmail 66874 invoked by uid 500); 17 Dec 2015 16:47:24 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 66800 invoked by uid 500); 17 Dec 2015 16:47:24 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 66790 invoked by uid 99); 17 Dec 2015 16:47:24 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Dec 2015 16:47:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id F327CC064D for ; Thu, 17 Dec 2015 16:47:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.714 X-Spam-Level: **** X-Spam-Status: No, score=4.714 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_COUK=1.1, RCVD_IN_SORBS_WEB=0.614] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id pqY4BtYoTVZX for ; Thu, 17 Dec 2015 16:47:19 +0000 (UTC) Received: from sulu.netzoomi.net (sulu.netzoomi.net [83.138.144.103]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTP id 97CBC2148B for ; Thu, 17 Dec 2015 16:47:18 +0000 (UTC) Received: from vulcan.netzoomi.net (unknown [212.100.249.54]) by sulu.netzoomi.net (Postfix) with ESMTP id A2AC66A4D0E for ; Thu, 17 Dec 2015 16:47:15 +0000 (GMT) X-Envelope-From: Received: from [10.26.41.64] (unknown [212.183.132.20]) by vulcan.netzoomi.net (Postfix) with ESMTPA id D4DA3124817A for ; Thu, 17 Dec 2015 16:47:12 +0000 (GMT) From: "Mich Talebzadeh" To: References: In-Reply-To: Subject: RE: Synchronizing Hive metastores across clusters Date: Thu, 17 Dec 2015 16:46:51 -0000 Message-ID: <015801d138ea$979bfc20$c6d3f460$@peridale.co.uk> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0159_01D138EA.979DF7F0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQFyw9OPJgXhK7TaNSPiJUyqqBdObZ+MTLig Content-Language: en-gb This is a multipart message in MIME format. ------=_NextPart_000_0159_01D138EA.979DF7F0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Are both clusters in active/active mode or the cloud based cluster is = standby? =20 From: Elliot West [mailto:teabot@gmail.com]=20 Sent: 17 December 2015 16:21 To: user@hive.apache.org Subject: Synchronizing Hive metastores across clusters =20 Hello, =20 I'm thinking about the steps required to repeatedly push Hive datasets = out from a traditional Hadoop cluster into a parallel cloud based = cluster. This is not a one off, it needs to be a constantly running sync = process. As new tables and partitions are added in one cluster, they = need to be synced to the cloud cluster. Assuming for a moment that I = have the HDFS data syncing working, I'm wondering what steps I need to = take to reliably ship the HCatalog metadata across. I use HCatalog as = the point of truth as to when when data is available and where it is = located and so I think that metadata is a critical element to replicate = in the cloud based cluster. =20 Does anyone have any recommendations on how to achieve this in practice? = One issue (of many I suspect) is that Hive appears to store = table/partition locations internally with absolute, fully qualified = URLs, therefore unless the target cloud cluster is similarly named and = configured some path transformation step will be needed as part of the = synchronisation process. =20 I'd appreciate any suggestions, thoughts, or experiences related to = this. =20 Cheers - Elliot. =20 =20 ------=_NextPart_000_0159_01D138EA.979DF7F0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Are both clusters in active/active mode or = the cloud based cluster is standby?

 

From:<= /b> Elliot West = [mailto:teabot@gmail.com]
Sent: 17 December 2015 = 16:21
To: user@hive.apache.org
Subject: = Synchronizing Hive metastores across clusters

 

Hello,

 

I'm thinking about the steps required to repeatedly = push Hive datasets out from a traditional Hadoop cluster into a parallel = cloud based cluster. This is not a one off, it needs to be a constantly = running sync process. As new tables and partitions are added in one = cluster, they need to be synced to the cloud cluster. Assuming for a = moment that I have the HDFS data syncing working, I'm wondering what = steps I need to take to reliably ship the HCatalog metadata across. I = use HCatalog as the point of truth as to when when data is available and = where it is located and so I think that metadata is a critical element = to replicate in the cloud based cluster.

 

Does anyone have any recommendations on how to achieve = this in practice? One issue (of many I suspect) is that Hive appears to = store table/partition locations internally with absolute, fully = qualified URLs, therefore unless the target cloud cluster is similarly = named and configured some path transformation step will be needed as = part of the synchronisation process.

 

I'd appreciate any suggestions, thoughts, or = experiences related to this.

 

Cheers - Elliot.

 

 

------=_NextPart_000_0159_01D138EA.979DF7F0--