From issues-return-175331-archive-asf-public=cust-asf.ponee.io@hive.apache.org Fri Dec 20 10:36:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 9291018064C for ; Fri, 20 Dec 2019 11:36:02 +0100 (CET) Received: (qmail 12840 invoked by uid 500); 20 Dec 2019 10:36:02 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 12822 invoked by uid 99); 20 Dec 2019 10:36:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Dec 2019 10:36:01 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EF241E2A0E for ; Fri, 20 Dec 2019 10:36:00 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 5F59A7801AE for ; Fri, 20 Dec 2019 10:36:00 +0000 (UTC) Date: Fri, 20 Dec 2019 10:36:00 +0000 (UTC) From: "ASF GitHub Bot (Jira)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Work logged] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-21213?focusedWorklogId=3D= 361577&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpa= nel#worklog-361577 ] ASF GitHub Bot logged work on HIVE-21213: ----------------------------------------- Author: ASF GitHub Bot Created on: 20/Dec/19 10:34 Start Date: 20/Dec/19 10:34 Worklog Time Spent: 10m=20 Work Description: ashutosh-bapat commented on pull request #587: HIVE= -21213 : Acid table bootstrap replication needs to handle directory created= by compaction with txn id URL: https://github.com/apache/hive/pull/587#discussion_r360315778 =20 =20 ########## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java ########## @@ -463,7 +464,29 @@ public static Path getCopyDestination(ReplChangeManag= er.FileInfo fileInfo, Path String[] subDirs =3D fileInfo.getSubDir().split(Path.SEPARATOR); Path destination =3D destRoot; for (String subDir: subDirs) { - destination =3D new Path(destination, subDir); + // If the directory is created by compactor, then the directory will= have the transaction id also. + // In case of replication, the same txn id can not be used at target= , as the txn with same id might be a + // aborted or live txn at target. + // In case of bootstrap load, we copy only the committed data, so th= e directory with only write id + // can be created. The validity txn id can be removed from the direc= tory name. + // TODO : Support for incremental load flow. This can be done once r= eplication of compaction is decided. + if (AcidUtils.getVisibilityTxnId(subDir) > 0) { =20 Review comment: Thanks for the explanation. =20 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. =20 For queries about this service, please contact Infrastructure at: users@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 361577) Time Spent: 1h 40m (was: 1.5h) > Acid table bootstrap replication needs to handle directory created by com= paction with txn id > -------------------------------------------------------------------------= ------------------- > > Key: HIVE-21213 > URL: https://issues.apache.org/jira/browse/HIVE-21213 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2, repl > Reporter: mahesh kumar behera > Assignee: mahesh kumar behera > Priority: Major > Labels: pull-request-available > Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, HIVE-21213= .03.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The current implementation of compaction=C2=A0uses the txn id in the dire= ctory name. This is used to isolate the queries from reading the directory = until compaction has finished and to avoid the compactor marking used earli= er. In case of replication, during bootstrap , directory is copied as it is= with the same name from source to destination cluster.=C2=A0But the direct= ory created by compaction with txn id can not be copied as the txn list at = target may be different from source. The txn id which is valid at source ma= y be an aborted txn at target. So conversion logic is required to create a = new directory with valid txn at target and dump the data to the newly creat= ed directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)