hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sankar Hariappan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16676) Bootstrap REPL DUMP should ensure no data loss due to concurrent operations.
Date Tue, 30 May 2017 07:13:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sankar Hariappan updated HIVE-16676:
------------------------------------
    Description: 
 For bootstrap dump, if the table is renamed after fetching the table names, then new table
will be missing in the dump and so the target database doesn't have both old and new table.
During incremental replication, later RENAME events will be noop as the old table doesn't
exist in target.

To generalise the solution for this issue, the following logic is proposed.
1. Each table should store the CREATE event ID into the table parameters. If a table follows
Create -> Drop -> Create sequence, then it is easy to differentiate if the table is
old or new one.
2. Bootstrap should combine the delta changes as Incremental Dump into the dumpDir.
3. After bootstrap dump completes, then traverse the events from bootDumpBeginReplId.
- If a RENAME event is found, then check,
- If the source table is dumped and create event ID matches, then just dump the RENAME event
as such.
- If the source table is dumped but the create event ID is later than the event, then skip
the event.
- If the source table doesn’t exist, but the target table exists, then skip the event.
- If both source and target tables are missing, then dump the target table to the bootstrap
dumpDir.
4. For other events, just dump the event with following logic.
- CREATE: If object exists, then skip else dump it.
- DROP: If object doesn’t exist, then skip else dump it.
- ALTER: If the object exist and the create event ID matches, then dump else skip it.
5. Rename event load should check source table and if create event ID is same, then apply
the event.
6. If source table doesn’t exist, then check if the target table exists, if yes, then skip
the event.


  was:
Currently, RENAME TABLE and RENAME PARTITION events are treated as ALTER events. 
For bootstrap dump, if the table is renamed after fetching the table names, then new table
will be missing in the dump and so the target database doesn't have both old and new table.
During incremental replication, later RENAME events will be noop as the old table doesn't
exist in target.
In order to make RENAME replication simple, it is suggested to treat RENAME as DROP+CREATE
event.
EVENT_RENAME_TABLE = EVENT_DROP_TABLE + EVENT_CREATE_TABLE.
EVENT_RENAME_PARTITION = EVENT_DROP_PARTITION + EVENT_ADD_PARTITION.


> Bootstrap REPL DUMP should ensure no data loss due to concurrent operations.
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-16676
>                 URL: https://issues.apache.org/jira/browse/HIVE-16676
>             Project: Hive
>          Issue Type: Sub-task
>          Components: repl
>    Affects Versions: 2.1.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>
>  For bootstrap dump, if the table is renamed after fetching the table names, then new
table will be missing in the dump and so the target database doesn't have both old and new
table. During incremental replication, later RENAME events will be noop as the old table doesn't
exist in target.
> To generalise the solution for this issue, the following logic is proposed.
> 1. Each table should store the CREATE event ID into the table parameters. If a table
follows Create -> Drop -> Create sequence, then it is easy to differentiate if the table
is old or new one.
> 2. Bootstrap should combine the delta changes as Incremental Dump into the dumpDir.
> 3. After bootstrap dump completes, then traverse the events from bootDumpBeginReplId.
> - If a RENAME event is found, then check,
> - If the source table is dumped and create event ID matches, then just dump the RENAME
event as such.
> - If the source table is dumped but the create event ID is later than the event, then
skip the event.
> - If the source table doesn’t exist, but the target table exists, then skip the event.
> - If both source and target tables are missing, then dump the target table to the bootstrap
dumpDir.
> 4. For other events, just dump the event with following logic.
> - CREATE: If object exists, then skip else dump it.
> - DROP: If object doesn’t exist, then skip else dump it.
> - ALTER: If the object exist and the create event ID matches, then dump else skip it.
> 5. Rename event load should check source table and if create event ID is same, then apply
the event.
> 6. If source table doesn’t exist, then check if the target table exists, if yes, then
skip the event.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message