Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B107B200D24 for ; Tue, 10 Oct 2017 03:30:44 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id AF9471609E0; Tue, 10 Oct 2017 01:30:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 01C991609CE for ; Tue, 10 Oct 2017 03:30:43 +0200 (CEST) Received: (qmail 63622 invoked by uid 500); 10 Oct 2017 01:30:43 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 63595 invoked by uid 99); 10 Oct 2017 01:30:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Oct 2017 01:30:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 12FFAC8C19 for ; Tue, 10 Oct 2017 01:30:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.362 X-Spam-Level: ** X-Spam-Status: No, score=2.362 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id eteqaeMuMfH9 for ; Tue, 10 Oct 2017 01:30:41 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DCCBA5F298 for ; Tue, 10 Oct 2017 01:30:40 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v9A1UXg2004116; Tue, 10 Oct 2017 01:30:33 GMT Message-Id: <201710100130.v9A1UXg2004116@ip-10-146-233-104.ec2.internal> X-Gerrit-PatchSet: 8 Date: Tue, 10 Oct 2017 01:30:33 +0000 From: "Impala Public Jenkins (Code Review)" To: Quanlong Huang , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: merged Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5448=3A_fix_invalid_number_of_splits_reported_in_Parquet_scan_node=0A?= X-Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1 X-Gerrit-Change-Number: 8147 X-Gerrit-ChangeURL: X-Gerrit-Commit: 192cd96d9ee3be1cd3e7c3ad774bf8d5c8efb1c0 In-Reply-To: References: Reply-To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org, huangquanlong@gmail.com MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.14.2 Content-Type: multipart/alternative; boundary="DJm+2vtOg4Y="; charset=UTF-8 archived-at: Tue, 10 Oct 2017 01:30:44 -0000 --DJm+2vtOg4Y= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Impala Public Jenkins has submitted this change and it was merged=2E ( http= ://gerrit=2Ecloudera=2Eorg:8080/8147 ) Change subject: IMPALA-5448: fix in= valid number of splits reported in Parquet scan node =2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E= =2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E=2E IMPALA-5448: fix invalid number of= splits reported in Parquet scan node Parquet splits with multi columns ar= e marked as completed by using HdfsScanNodeBase::RangeComplete()=2E It dupl= icately counts the file types as column codec types=2E Thus the number of p= arquet splits are the real count multiplies number of materialized columns= =2E Furthermore, according to the Parquet definition, it allows mixed comp= ression codecs on different columns=2E This's handled in this patch as well= =2E A parquet file using gzip and snappy compression codec will be reported= as: FileFormats: PARQUET/(GZIP,SNAPPY):1 This patch introduces a compres= sion types set for the above cases=2E Testing: Add end-to-end tests handli= ng parquet files with all columns compressed in snappy, and handling parque= t files with multi compression codec=2E Change-Id: Iaacc2d775032f5707061e7= 04f12e0a63cde695d1 Reviewed-on: http://gerrit=2Ecloudera=2Eorg:8080/8147 Re= viewed-by: Tim Armstrong Tested-by: Impala Publ= ic Jenkins --- M be/src/exec/hdfs-scan-node-base=2Ecc M be/src/exec/hdfs-sc= an-node-base=2Eh A testdata/multi_compression_parquet_data/README A testdat= a/multi_compression_parquet_data/tinytable_0_gzip_snappy=2Eparq A testdata/= multi_compression_parquet_data/tinytable_1_snappy_gzip=2Eparq A testdata/wo= rkloads/functional-query/queries/QueryTest/hdfs_parquet_scan_node_profile= =2Etest M tests/query_test/test_scanners=2Epy 7 files changed, 132 insertio= ns(+), 13 deletions(-) Approvals: Tim Armstrong: Looks good to me, appro= ved Impala Public Jenkins: Verified -- To view, visit http://gerrit=2Ec= loudera=2Eorg:8080/8147 To unsubscribe, visit http://gerrit=2Ecloudera=2Eor= g:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Me= ssageType: merged Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695= d1 Gerrit-Change-Number: 8147 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Hua= ng Gerrit-Reviewer: Impala Public Jenkins Gerri= t-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Quan= long Huang Gerrit-Reviewer: Tim Armstrong --DJm+2vtOg4Y=--