hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Bapat (JIRA)" <>
Subject [jira] [Assigned] (HIVE-21893) Handle concurrent write + drop when ACID tables are getting bootstrapped.
Date Fri, 05 Jul 2019 10:37:00 GMT


Ashutosh Bapat reassigned HIVE-21893:

    Assignee: Sankar Hariappan  (was: Ashutosh Bapat)

> Handle concurrent write + drop when ACID tables are getting bootstrapped.
> -------------------------------------------------------------------------
>                 Key: HIVE-21893
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 4.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: DR, Replication
> ACID tables will be bootstrapped during incremental phase in couple of cases. 
> 1. hive.repl.bootstrap.acid.tables is set to true in WITH clause of REPL DUMP.
> 2. If replication policy is changed using REPLACE clause in REPL DUMP where the ACID
table is matching new policy but not old policy.
> REPL DUMP performed below sequence of operations. Let's say Thread (T1)
> 1. Get Last Repl ID (lastId)
> 2. Open Transaction (Tx1)
> 3. Dump events until lastId.
> 4. Get the list of tables in the given DB.
> 5. If table matches current policy, then bootstrap dump it.
> Let's say, concurrently another thread  (let's say T2) is running as follows.
> 11. Open Transaction (Tx2).
> 12. Insert into ACID table Tbl1.
> 13. Commit Transaction (Tx2)
> 14. Drop table (Tbl1) --> Not necessarily same thread, may be from different thread
as well.
> *Problematic Use-cases:*
> 1. If Step-11 happens between Step-1 and Step-2. Also, Step-13 completes before we forcefully
abort Tx2 from REPL DUMP thread T1. Also, assume Step-14 is done after bootstrap is completed.
In this case, bootstrap would replicate the data/writeId written by Tx2. But, the next incremental
cycle would also replicate the open_txn, allocate_writeid and commit_txn events which would
duplicate the data.
> 2. If Step-11 to Step-14 in Thread T2 happens after Step-1 in REPL DUMP thread T1. In
this case, table is not bootstrapped but the corresponding open_txn, allocate_writeid, commit_txn
and drop events would be replicated in next cycle. During next cycle, REPL LOAD would fail
on commmitTxn event as table is dropped or event is missing.

This message was sent by Atlassian JIRA

View raw message