hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Bapat (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (HIVE-21893) Handle concurrent write + drop when ACID tables are getting bootstrapped.
Date Fri, 05 Jul 2019 10:37:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Bapat reassigned HIVE-21893:
-------------------------------------

    Assignee: Sankar Hariappan  (was: Ashutosh Bapat)

> Handle concurrent write + drop when ACID tables are getting bootstrapped.
> -------------------------------------------------------------------------
>
>                 Key: HIVE-21893
>                 URL: https://issues.apache.org/jira/browse/HIVE-21893
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 4.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: DR, Replication
>
> ACID tables will be bootstrapped during incremental phase in couple of cases. 
> 1. hive.repl.bootstrap.acid.tables is set to true in WITH clause of REPL DUMP.
> 2. If replication policy is changed using REPLACE clause in REPL DUMP where the ACID
table is matching new policy but not old policy.
> REPL DUMP performed below sequence of operations. Let's say Thread (T1)
> 1. Get Last Repl ID (lastId)
> 2. Open Transaction (Tx1)
> 3. Dump events until lastId.
> 4. Get the list of tables in the given DB.
> 5. If table matches current policy, then bootstrap dump it.
> Let's say, concurrently another thread  (let's say T2) is running as follows.
> 11. Open Transaction (Tx2).
> 12. Insert into ACID table Tbl1.
> 13. Commit Transaction (Tx2)
> 14. Drop table (Tbl1) --> Not necessarily same thread, may be from different thread
as well.
> *Problematic Use-cases:*
> 1. If Step-11 happens between Step-1 and Step-2. Also, Step-13 completes before we forcefully
abort Tx2 from REPL DUMP thread T1. Also, assume Step-14 is done after bootstrap is completed.
In this case, bootstrap would replicate the data/writeId written by Tx2. But, the next incremental
cycle would also replicate the open_txn, allocate_writeid and commit_txn events which would
duplicate the data.
> 2. If Step-11 to Step-14 in Thread T2 happens after Step-1 in REPL DUMP thread T1. In
this case, table is not bootstrapped but the corresponding open_txn, allocate_writeid, commit_txn
and drop events would be replicated in next cycle. During next cycle, REPL LOAD would fail
on commmitTxn event as table is dropped or event is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message