hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Work logged] (HIVE-20968) Support conversion of managed to external where location set was not owned by hive
Date Tue, 09 Apr 2019 09:54:01 GMT


ASF GitHub Bot logged work on HIVE-20968:

                Author: ASF GitHub Bot
            Created on: 09/Apr/19 09:53
            Start Date: 09/Apr/19 09:53
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on pull request #588: HIVE-20968 : Support conversion
of managed to external where location set was not owned by hive

 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/
 @@ -84,6 +84,34 @@ public TableExport(Paths paths, TableSpec tableSpec, ReplicationSpec replication
     this.conf = conf;
     this.paths = paths;
     this.mmCtx = mmCtx;
+    this.replicationSpec.setEventBasedOwnershipCheck(false);
+    setPathOwnedByHive(this.replicationSpec, tableSpec.tableHandle.getDataLocation(), db.getConf());
+  }
+  public static void setPathOwnedByHive(ReplicationSpec replicationSpec, Path location, HiveConf
conf) {
+    // For incremental load path, this flag should be set using the owner name in the event.
+    if (replicationSpec == null || !replicationSpec.isInReplicationScope() ||
+            replicationSpec.isEventBasedOwnershipCheck()) {
+      return;
+    }
+    // If the table path or path of any of the partitions is not owned by hive,
+    // then table location not owned by hive for whole table.
+    if (!replicationSpec.isPathOwnedByHive()) {
+"Path is not owned by hive user for table or some partition. No need to
check further.");
+      return;
+    }
+    try {
+      FileStatus fileStatus = location.getFileSystem(conf).getFileStatus(location);
+      String hiveOwner = conf.get(HiveConf.ConfVars.STRICT_MANAGED_TABLES_MIGRARTION_OWNER.varname,
+      replicationSpec.setPathOwnedByHive(hiveOwner.equals(fileStatus.getOwner()));
 Review comment:
   Is user name case sensitive? If not, we need to use equalsIgnoreCase.
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 224871)
    Time Spent: 50m  (was: 40m)

> Support conversion of managed to external where location set was not owned by hive
> ----------------------------------------------------------------------------------
>                 Key: HIVE-20968
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: repl
>    Affects Versions: 4.0.0
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: DR, pull-request-available
>         Attachments: HIVE-20968.01.patch
>          Time Spent: 50m
>  Remaining Estimate: 0h
> As per migration rule, if a location is outside the default managed table directory and
the location is not owned by "hive" user, then it should be converted to external table after
>  So, the same rule is applicable for Hive replication where the data of source managed
table is residing outside the default warehouse directory and is not owned by "hive" user.
>  During this conversion, the path should be preserved in target as well so that failover
works seamlessly.
>  # If the table location is out side hive warehouse and is not owned by hive, then the
table at target will be converted to external table. But the location can not be retained
, it will be retained relative to hive external warehouse directory. 
>  #  As the table is not an external table at source, only those data which are added
using events will be replicated.
>  # The ownership of the location will be stored in the create table event and will be
used to compare it with strict.managed.tables.migration.owner to decide if the flag in replication
scope can be set. This flag is used to convert the managed table to external table at target.
> Some of the scenarios needs to be blocked if the database is set for replication from
a cluster with non strict managed table setting to strict managed table.
> 1. Block alter table / partition set location for database with source of replication
set for managed tables
> 2. If user manually changes the ownership of the location, hive replication may go to
a non recoverable state.
> 3. Block add partition if the location ownership is different than table location for
managed tables.
> 4. User needs to set strict.managed.tables.migration.owner along with dump command (default
to hive user). This value will be used during dump to decide the ownership which will be used
during load to decide the table type. The location owner information can be stored in the
events during create table. The flag can be stored in replication spec. Check other such configs
used in upgrade tool.
> 5. Replication flow also set additional parameter "external.table.purge"="true" ..only
for migration to external table
> 6. Block conversion from managed to external and vice versa. Pass some flag in upgrade
flow to allow this conversion during upgrade flow.

This message was sent by Atlassian JIRA

View raw message