hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sankar Hariappan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-21029) External table replication for existing deployments running incremental replication.
Date Thu, 31 Jan 2019 04:48:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sankar Hariappan updated HIVE-21029:
------------------------------------
    Attachment: HIVE-21029.04.patch

> External table replication for existing deployments running incremental replication.
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-21029
>                 URL: https://issues.apache.org/jira/browse/HIVE-21029
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 3.0.0, 3.1.0, 3.1.1
>            Reporter: anishek
>            Assignee: Sankar Hariappan
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21029.01.patch, HIVE-21029.02.patch, HIVE-21029.03.patch, HIVE-21029.04.patch
>
>
> Existing deployments using hive replication do not get external tables replicated. For
such deployments to enable external table replication they will have to provide a specific
switch to first bootstrap external tables as part of hive incremental replication, following
which the incremental replication will take care of further changes in external tables.
> The switch will be provided by an additional hive configuration (for ex: hive.repl.bootstrap.external.tables)
and is to be used in 
> {code} WITH {code}  clause of 
> {code} REPL DUMP {code} command. 
> Additionally the existing hive config _hive.repl.include.external.tables_  will always
have to be set to "true" in the above clause. 
> Proposed usage for enabling external tables replication on existing replication policy.
> 1. Consider an ongoing repl policy <db1> in incremental phase.
> Enable hive.repl.include.external.tables=true and hive.repl.bootstrap.external.tables=true
in next incremental REPL DUMP.
> - Dumps all events but skips events related to external tables.
> - Instead, combine bootstrap dump for all external tables under “_bootstrap” directory.
> - Also, includes the data locations file "_external_tables_info”.
> - LIMIT or TO clause shouldn’t be there to ensure the latest events are dumped before
bootstrap dumping external tables.
> 2. REPL LOAD on this dump applies all the events first, copies external tables data and
then bootstrap external tables (metadata).
> - It is possible that the external tables (metadata) are not point-in time consistent
with rest of the tables.
> - But, it would be eventually consistent when the next incremental load is applied.
> - This REPL LOAD is fault tolerant and can be retried if failed.
> 3. All future REPL DUMPs on this repl policy should set hive.repl.bootstrap.external.tables=false.
> - If not set to false, then target might end up having inconsistent set of external tables
as bootstrap wouldn’t clean-up any dropped external tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message