From reviews-return-81689-archive-asf-public=cust-asf.ponee.io@impala.apache.org Tue Jul 14 12:54:00 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 174EC18064D for ; Tue, 14 Jul 2020 14:53:59 +0200 (CEST) Received: (qmail 155 invoked by uid 500); 14 Jul 2020 12:53:59 -0000 Mailing-List: contact reviews-help@impala.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.apache.org Received: (qmail 132 invoked by uid 99); 14 Jul 2020 12:53:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jul 2020 12:53:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 863CE1A447E for ; Tue, 14 Jul 2020 12:53:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.573 X-Spam-Level: X-Spam-Status: No, score=0.573 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=0.2, KAM_DMARC_STATUS=0.01, RDNS_DYNAMIC=0.363, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id QjIMVrZiTtsD for ; Tue, 14 Jul 2020 12:53:56 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=75.101.130.251; helo=ip-10-146-233-104.ec2.internal; envelope-from=gerrit@cloudera.org; receiver= Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id CFC027F69D for ; Tue, 14 Jul 2020 12:53:55 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id 06ECrs5f019330; Tue, 14 Jul 2020 12:53:54 GMT Message-Id: <202007141253.06ECrs5f019330@ip-10-146-233-104.ec2.internal> X-Gerrit-PatchSet: 15 Date: Tue, 14 Jul 2020 12:53:54 +0000 From: "Impala Public Jenkins (Code Review)" To: Zoltan Borok-Nagy , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: merged Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-9859=3A_Full_ACID_Milestone_4=3A_Part_1_Reading_modified_tables_=28primitive_types=29=0A?= X-Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659 X-Gerrit-Change-Number: 16082 X-Gerrit-ChangeURL: X-Gerrit-Commit: f602c3f80f5f61ccaebdf1493ff7c89230b77410 In-Reply-To: References: Reply-To: impala-public-jenkins@cloudera.com, impala-cr@cloudera.com, boroknagyz@cloudera.com, reviews@impala.incubator.apache.org, huangquanlong@gmail.com MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.14.15 Content-Type: multipart/alternative; boundary="DT6XInjhXLA="; charset=UTF-8 --DT6XInjhXLA= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Impala Public Jenkins has submitted this change and it was merged=2E ( http= ://gerrit=2Ecloudera=2Eorg:8080/16082 ) Change subject: IMPALA-9859: Full = ACID Milestone 4: Part 1 Reading modified tables (primitive types) =2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E IMPALA-9859: Full A= CID Milestone 4: Part 1 Reading modified tables (primitive types) Hive ACI= D supports row-level DELETE and UPDATE operations on a table=2E It achieves= it via assigning a unique row-id for each row, and maintaining two sets of= files in a table=2E The first set is in the base/delta directories, they c= ontain the INSERTed rows=2E The second set of files are in the delete-delta= directories, they contain the DELETEd rows=2E (UPDATE operations are impl= emented via DELETE+INSERT=2E) In the filesystem it looks like e=2Eg=2E: *= full_acid/delta_0000001_0000001_0000/0000_0 * full_acid/delta_0000002_000= 0002_0000/0000_0 * full_acid/delete_delta_0000003_0000003_0000/0000_0 Dur= ing scanning we need to return INSERTed rows minus DELETEd rows=2E This pat= ch implements it by creating an ANTI JOIN between the INSERT and DELETE set= s=2E It is a planner-only modification=2E Every HDFS SCAN that scans full A= CID tables (that also have deleted rows) are converted to two HDFS SCANs, o= ne for the INSERT deltas, and one for the DELETE deltas=2E Then a LEFT ANTI= HASH JOIN with BROADCAST distribution mode is created above them=2E Later= we can add support for other distribution modes if the performance require= s it=2E E=2Eg=2E if we have too many deleted rows then probably we are bett= er off with PARTITIONED distribution mode=2E We could estimate the number o= f deleted rows by sampling the delete delta files=2E The current patch onl= y works for primitive types=2E I=2Ee=2E we cannot select nested data if the= table has deleted rows=2E Testing: * added planner test * added e2e tes= ts Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659 Reviewed-on: http:= //gerrit=2Ecloudera=2Eorg:8080/16082 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M common/thrift/CatalogObjects=2Ethrif= t M common/thrift/CatalogService=2Ethrift M fe/src/main/java/org/apache/imp= ala/catalog/FeCatalogUtils=2Ejava M fe/src/main/java/org/apache/impala/cata= log/FeFsPartition=2Ejava M fe/src/main/java/org/apache/impala/catalog/FeFsT= able=2Ejava M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader= =2Ejava M fe/src/main/java/org/apache/impala/catalog/HdfsPartition=2Ejava M= fe/src/main/java/org/apache/impala/catalog/HdfsTable=2Ejava M fe/src/main/= java/org/apache/impala/catalog/ParallelFileMetadataLoader=2Ejava M fe/src/m= ain/java/org/apache/impala/catalog/local/CatalogdMetaProvider=2Ejava M fe/s= rc/main/java/org/apache/impala/catalog/local/DirectMetaProvider=2Ejava M fe= /src/main/java/org/apache/impala/catalog/local/LocalFsPartition=2Ejava M fe= /src/main/java/org/apache/impala/catalog/local/LocalFsTable=2Ejava M fe/src= /main/java/org/apache/impala/catalog/local/MetaProvider=2Ejava M fe/src/mai= n/java/org/apache/impala/planner/HashJoinNode=2Ejava M fe/src/main/java/org= /apache/impala/planner/JoinNode=2Ejava M fe/src/main/java/org/apache/impala= /planner/SingleNodePlanner=2Ejava M fe/src/main/java/org/apache/impala/util= /AcidUtils=2Ejava M fe/src/test/java/org/apache/impala/planner/PlannerTest= =2Ejava M fe/src/test/java/org/apache/impala/util/AcidUtilsTest=2Ejava M te= stdata/datasets/functional/functional_schema_template=2Esql M testdata/data= sets/functional/schema_constraints=2Ecsv A testdata/workloads/functional-pl= anner/queries/PlannerTest/acid-scans=2Etest M testdata/workloads/functional= -query/queries/QueryTest/acid-negative=2Etest A testdata/workloads/function= al-query/queries/QueryTest/full-acid-scans=2Etest M tests/custom_cluster/te= st_local_catalog=2Epy M tests/query_test/test_acid=2Epy 27 files changed, 1= ,710 insertions(+), 151 deletions(-) Approvals: Impala Public Jenkins: L= ooks good to me, approved; Verified -- To view, visit http://gerrit=2Eclo= udera=2Eorg:8080/16082 To unsubscribe, visit http://gerrit=2Ecloudera=2Eorg= :8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Mes= sageType: merged Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d065= 9 Gerrit-Change-Number: 16082 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Boro= k-Nagy Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang = Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer= : Zoltan Borok-Nagy --DT6XInjhXLA=--