hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-21763) Incremental replication to allow changing include/exclude tables list in replication policy.
Date Tue, 18 Jun 2019 09:49:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21763?focusedWorklogId=262179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262179
]

ASF GitHub Bot logged work on HIVE-21763:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jun/19 09:48
            Start Date: 18/Jun/19 09:48
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on pull request #673: HIVE-21763: Incremental replication
to allow changing include/exclude tables list in replication policy.
URL: https://github.com/apache/hive/pull/673#discussion_r294704914
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##########
 @@ -364,6 +367,35 @@ private void cleanTablesFromBootstrap() throws HiveException, IOException,
Inval
     }
   }
 
+  /**
+   * If replication policy is changed between previous and current load, then the excluded
tables in
+   * the new replication policy will be dropped.
+   * @throws HiveException Failed to get/drop the tables.
+   */
+  private void dropTablesExcludedInReplScope(ReplScope replScope) throws HiveException {
+    // If all tables are included in replication scope, then nothing to be dropped.
+    if ((replScope == null) || replScope.includeAllTables()) {
+      return;
+    }
+
+    Hive db = getHive();
+    String dbName = replScope.getDbName();
+
+    // List all the tables that are excluded in the current repl scope.
+    Iterable<String> tableNames = Collections2.filter(db.getAllTables(dbName),
+        tableName -> {
+          assert(tableName != null);
+          return !tableName.toLowerCase().startsWith(
 
 Review comment:
   I got your point... But this is Iterable. We list from getAllTables and then the below
loop iterate and just drop when found a match. So, it is ideally the same as what you expect.
Also, not sure if we can handle exceptions in filter method. Anyways, this is same as we don't
have any double loops.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 262179)
    Time Spent: 2h 20m  (was: 2h 10m)

> Incremental replication to allow changing include/exclude tables list in replication
policy.
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21763
>                 URL: https://issues.apache.org/jira/browse/HIVE-21763
>             Project: Hive
>          Issue Type: Sub-task
>          Components: repl
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: DR, Replication, pull-request-available
>         Attachments: HIVE-21763.01.patch, HIVE-21763.02.patch, HIVE-21763.03.patch
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> - REPL DUMP takes 2 inputs along with existing FROM and WITH clause.
> {code}
> - REPL DUMP <current_repl_policy> [REPLACE <previous_repl_policy> FROM <last_repl_id>
WITH <key_values_list>;
> - current_repl_policy and previous_repl_policy can be any format mentioned in Point-4.
> - REPLACE clause to be supported to take previous repl policy as input. If REPLACE clause
is not there, then the policy remains unchanged.
> - Rest of the format remains same.
> {code}
> - Now, REPL DUMP on this DB will replicate the tables based on current_repl_policy.
> - Single table replication of format <db_name>.t1 doesn’t allow changing the
policy dynamically. So REPLACE clause is not allowed if previous_repl_policy of this format.
> - If any table is added dynamically either due to change in regular expression or added
to include list should be bootstrapped using independant table level replication policy.
> {code}
> - Hive will automatically figure out the list of tables newly included in the list by
comparing the current_repl_policy & previous_repl_policy inputs and combine bootstrap
dump for added tables as part of incremental dump. "_bootstrap" directory can be created in
dump dir to accommodate all tables to be bootstrapped.
> - If any table is renamed, then it may gets dynamically added/removed for replication
based on defined replication policy + include/exclude list. So, Hive will perform bootstrap
for the table which is just included after rename.
> {code}
> - REPL LOAD should check for changes in repl policy and drop the tables/views excluded
in the new policy  compared to previous policy. It should be done before performing incremental
and bootstrap load from the current dump.
> - REPL LOAD on incremental dump should load events directories first and then check for
"_bootstrap" directory and perform bootstrap load on them.
> Rename table is not in scope of this jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message