hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id
Date Fri, 20 Dec 2019 10:36:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=361577&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361577
]

ASF GitHub Bot logged work on HIVE-21213:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Dec/19 10:34
            Start Date: 20/Dec/19 10:34
    Worklog Time Spent: 10m 
      Work Description: ashutosh-bapat commented on pull request #587: HIVE-21213 : Acid table
bootstrap replication needs to handle directory created by compaction with txn id
URL: https://github.com/apache/hive/pull/587#discussion_r360315778
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
 ##########
 @@ -463,7 +464,29 @@ public static Path getCopyDestination(ReplChangeManager.FileInfo fileInfo,
Path
     String[] subDirs = fileInfo.getSubDir().split(Path.SEPARATOR);
     Path destination = destRoot;
     for (String subDir: subDirs) {
-      destination = new Path(destination, subDir);
+      // If the directory is created by compactor, then the directory will have the transaction
id also.
+      // In case of replication, the same txn id can not be used at target, as the txn with
same id might be a
+      // aborted or live txn at target.
+      // In case of bootstrap load, we copy only the committed data, so the directory with
only write id
+      // can be created. The validity txn id can be removed from the directory name.
+      // TODO : Support for incremental load flow. This can be done once replication of compaction
is decided.
+      if (AcidUtils.getVisibilityTxnId(subDir) > 0) {
 
 Review comment:
   Thanks for the explanation.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 361577)
    Time Spent: 1h 40m  (was: 1.5h)

> Acid table bootstrap replication needs to handle directory created by compaction with
txn id
> --------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21213
>                 URL: https://issues.apache.org/jira/browse/HIVE-21213
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, HiveServer2, repl
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, HIVE-21213.03.patch
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory name. This
is used to isolate the queries from reading the directory until compaction has finished and
to avoid the compactor marking used earlier. In case of replication, during bootstrap , directory
is copied as it is with the same name from source to destination cluster. But the directory
created by compaction with txn id can not be copied as the txn list at target may be different
from source. The txn id which is valid at source may be an aborted txn at target. So conversion
logic is required to create a new directory with valid txn at target and dump the data to
the newly created directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message