hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sankar Hariappan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.
Date Fri, 14 Jul 2017 21:11:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088069#comment-16088069
] 

Sankar Hariappan edited comment on HIVE-16990 at 7/14/17 9:10 PM:
------------------------------------------------------------------

Added 01.patch with below updates.
- The setting of current repl state by TableSerializer and PartitionSerializer is limited
to only bootstrap dump. In case of incremental dump, this is done by load.
- Repl load track the metadata objects modified using newly UpdatedMetadataTracker object.
This replaces the dbsUpdated and tablesUpdated maps.
- Added additional alter tasks to update the current repl state of the updated metadata objects.
All these alter tasks are added after applying each event. This increased the number of tasks
for each event. The overall execution time of replication test cases also increased due to
this. Will try to optimise later.
- Made ReplCopyTasks to throw error if any of the listed file is missing from both original
path and cmpath. Corrected the test cases to handle this failure case.
- Removed unused or dead code wherever found.
- Added a new test case to verify the repl status on failure and ensure if retry of failed
dump works after fix.

Request [~daijy]/[~sushanth]/[~anishek]/[~thejas] to review the patch!




was (Author: sankarh):
Added 01.patch with below updates.
- The setting of current repl state by TableSerializer and PartitionSerializer is limited
to only bootstrap dump. In case of incremental dump, this is done by load.
- Repl load track the metadata objects modified using newly UpdatedMetadataTracker object.
This replaces the dbsUpdated and tablesUpdated maps.
- Added additional alter tasks to update the current repl state of the updated metadata objects.
All these alter tasks are added after applying each event. This increased the number of tasks
for each event. The overall execution time of replication test cases also increased due to
this. Will try to optimise later.
- Made ReplCopyTasks to throw error if any of the listed file is missing from both original
path and cmpath. Corrected the test cases to handle this failure case.
- Removed unused or dead code wherever found.

Request [~daijy]/[~sushanth]/[~anishek]/[~thejas] to review the patch!



> REPL LOAD should update last repl ID only after successful copy of data files.
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-16990
>                 URL: https://issues.apache.org/jira/browse/HIVE-16990
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive, repl
>    Affects Versions: 2.1.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>              Labels: DR, replication
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16990.01.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should follow the
below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and ensures
no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message