Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 673FB200CC8 for ; Fri, 30 Jun 2017 08:13:51 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 655B0160BF7; Fri, 30 Jun 2017 06:13:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AC4A2160BED for ; Fri, 30 Jun 2017 08:13:50 +0200 (CEST) Received: (qmail 24923 invoked by uid 500); 30 Jun 2017 06:13:49 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 24912 invoked by uid 99); 30 Jun 2017 06:13:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jun 2017 06:13:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 31751C0F8E for ; Fri, 30 Jun 2017 06:13:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 22IfuOGPmZ8h for ; Fri, 30 Jun 2017 06:13:47 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id CC39D5FCDA for ; Fri, 30 Jun 2017 06:13:46 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v5U6Dk8l029801; Fri, 30 Jun 2017 06:13:46 GMT Message-Id: <201706300613.v5U6Dk8l029801@ip-10-146-233-104.ec2.internal> Date: Fri, 30 Jun 2017 06:13:44 +0000 From: "Alex Behm (Code Review)" To: Dan Hecht , Tim Armstrong , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Marcel Kornacker , Henry Robinson , Sailesh Mukil , anujphadke Reply-To: alex.behm@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-3905=3A_HdfsScanner=3A=3AGetNext=28=29_for_Avro=2C_RC=2C_and_Seq_scans=2E=0A?= X-Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 X-Gerrit-ChangeURL: X-Gerrit-Commit: f50cf0d2e591c9fe7f6ba4b5d75049635dec325f In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Fri, 30 Jun 2017 06:13:51 -0000 Hello Dan Hecht, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6527 to look at the new patch set (#9). Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. ...................................................................... IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans. Implements HdfsScanner::GetNext() for the Avro, RC File, and Sequence File scanners. Changes ProcessSplit() to repeatedly call GetNext() to share the core scanning code between the legacy ProcessSplit() interface (ProcessSplit()) and the new GetNext() interface. Summary of changes: - Slightly change code flow for initial scan range that only parses the file header. The new code sets 'only_parsing_header_' in Open() and then honors that flag in GetNextInternal(). Before, all the logic was inside ProcessSpit(). - Replace 'finished_' with 'eos_'. - Add a RowBatch parameter to various functions. - Change Close() to free all resources when a nullptr RowBatch is passed. Testing: - Exhaustive tests passed on debug - Core tests passed on asan - TODO: Perf testing on cluster Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/base-sequence-scanner.h M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-rcfile-scanner.cc M be/src/exec/hdfs-rcfile-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/kudu-scan-node.cc M be/src/exec/scan-node.h M be/src/util/blocking-queue.h M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test 26 files changed, 709 insertions(+), 824 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/27/6527/9 -- To view, visit http://gerrit.cloudera.org:8080/6527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke