hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <>
Subject [jira] [Created] (HIVE-7973) Hive Replication Support
Date Thu, 04 Sep 2014 01:14:54 GMT
Sushanth Sowmyan created HIVE-7973:

             Summary: Hive Replication Support
                 Key: HIVE-7973
             Project: Hive
          Issue Type: Bug
          Components: Import/Export
            Reporter: Sushanth Sowmyan

A need for replication is a common one in many database management systems, and it's important
for hive to evolve support for such a tool as part of its ecosystem. Hive already supports
an EXPORT and IMPORT command, which can be used to dump out tables, distcp them to another
cluster, and and import/create from that. If we had a mechanism by which exports and imports
could be automated, it establishes the base with which replication can be developed.

One place where this kind of automation can be developed is with aid of the HiveMetaStoreEventHandler
mechanisms, to generate notifications when certain changes are committed to the metastore,
and then translate those notifications to export actions, distcp actions and import actions
on another import action.

Part of that already exists is with the Notification system that is part of hcatalog-server-extensions.
Initially, this was developed to be able to trigger a JMS notification, which an Oozie workflow
can use to can start off actions keyed on the finishing of a job that used HCatalog to write
to a table. While this currently lives under hcatalog, the primary reason for its existence
has a scope well past hcatalog alone, and can be used as-is without the use of HCatalog IF/OF.
This can be extended, with the help of a library which does that aforementioned translation.
I also think that these sections should live in a core hive module, rather than being tucked
away inside hcatalog.

Once we have rudimentary support for table & partition replication, we can then move on
to further requirements of replication, such as metadata replications (such as replication
of changes to roles/etc), and/or optimize away the requirement to distcp and use webhdfs instead,

This Story tracks all the bits that go into development of such a system - I'll create multiple
smaller tasks inside this as we go on.

This message was sent by Atlassian JIRA

View raw message