hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sankar Hariappan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-18988) Support bootstrap replication of ACID tables
Date Tue, 01 May 2018 05:44:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-18988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sankar Hariappan updated HIVE-18988:
------------------------------------
    Attachment: HIVE-18988.07.patch

> Support bootstrap replication of ACID tables
> --------------------------------------------
>
>                 Key: HIVE-18988
>                 URL: https://issues.apache.org/jira/browse/HIVE-18988
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: ACID, DR, pull-request-available, replication
>             Fix For: 3.1.0
>
>         Attachments: HIVE-18988.01.patch, HIVE-18988.02.patch, HIVE-18988.03.patch, HIVE-18988.04.patch,
HIVE-18988.05.patch, HIVE-18988.06.patch, HIVE-18988.07.patch
>
>
> Bootstrapping of ACID tables, need special handling to replicate a stable state of data.
>  - If ACID feature enables, then perform bootstrap dump for ACID tables with in read
txn.
>  -> Dump table/partition metadata.
>  -> Get the list of valid data files for a table using same logic as read txn do.
>  -> Dump latest ValidWriteIdList as per current read txn.
>  - Set the valid last replication state such that it doesn't miss any open txn started
after triggering bootstrap dump.
>  - If any txns on-going which was opened before triggering bootstrap dump, then it is
not guaranteed that if open_txn event captured for these txns. Also, if these txns are opened
for streaming ingest case, then dumped ACID table data may include data of open txns which
impact snapshot isolation at target. To avoid that, bootstrap dump should wait for timeout
(new configuration: hive.repl.bootstrap.dump.open.txn.timeout). After timeout, just force
abort those txns and continue.
>  - If any txns force aborted belongs to a streaming ingest case, then dumped ACID table
data may have aborted data too. So, it is necessary to replicate the aborted write ids to
target to mark those data invalid for any readers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message