hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.
Date Tue, 18 Jun 2019 04:40:00 GMT


ASF GitHub Bot logged work on HIVE-21763:

                Author: ASF GitHub Bot
            Created on: 18/Jun/19 04:39
            Start Date: 18/Jun/19 04:39
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on pull request #673: HIVE-21763: Incremental replication
to allow changing include/exclude tables list in replication policy.

 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
 @@ -894,12 +894,20 @@ replDumpStatement
 @after { popMsg(state); }
       : KW_REPL KW_DUMP
         (dbName=identifier) (DOT tablePolicy=replTableLevelPolicy)?
+        (KW_REPLACE replacePolicy=replReplacePolicy)?
         (KW_FROM (eventId=Number)
           (KW_TO (rangeEnd=Number))?
           (KW_LIMIT (batchSize=Number))?
         (KW_WITH replConf=replConfigs)?
-    -> ^(TOK_REPL_DUMP $dbName $tablePolicy? ^(TOK_FROM $eventId (TOK_TO $rangeEnd)? (TOK_LIMIT
$batchSize)?)? $replConf?)
+    -> ^(TOK_REPL_DUMP $dbName $tablePolicy? $replacePolicy? ^(TOK_FROM $eventId (TOK_TO
$rangeEnd)? (TOK_LIMIT $batchSize)?)? $replConf?)
+    ;
 Review comment:
   It is same. Only difference is replReplacePolicy takes token TOK_REPLACE additionally.
I think, this can be changed to have common code. Will make this change.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 262031)
    Time Spent: 1h 20m  (was: 1h 10m)

> Incremental replication to allow changing include/exclude tables list in replication
> --------------------------------------------------------------------------------------------
>                 Key: HIVE-21763
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: repl
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: DR, Replication, pull-request-available
>         Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP <current_repl_policy> [REPLACE <previous_repl_policy> FROM <last_repl_id>
WITH <key_values_list>;
> - current_repl_policy and previous_repl_policy can be any format mentioned in Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If REPLACE clause
is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on current_repl_policy.
> - Single table replication of format <db_name>.t1 doesn’t allow changing the
policy dynamically. So REPLACE clause is not allowed if previous_repl_policy of this format.
> - If any table is added dynamically either due to change in regular expression or added
to include list should be bootstrapped using independant table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the list by
comparing the current_repl_policy & previous_repl_policy inputs and combine bootstrap
dump for added tables as part of incremental dump. "_bootstrap" directory can be created in
dump dir to accommodate all tables to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for replication
based on defined replication policy + include/exclude list. So, Hive will perform bootstrap
for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views excluded
in the new policy  compared to previous policy. It should be done before performing incremental
and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then check for
"_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.

This message was sent by Atlassian JIRA

View raw message