impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Impala Public Jenkins (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
Date Sat, 01 Jul 2017 21:59:35 GMT
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.
......................................................................


IMPALA-3905: HdfsScanner::GetNext() for Avro, RC, and Seq scans.

Implements HdfsScanner::GetNext() for the Avro, RC File, and
Sequence File scanners. Changes ProcessSplit() to repeatedly call
GetNext() to share the core scanning code between the legacy
ProcessSplit() interface (ProcessSplit()) and the new GetNext()
interface.

Summary of changes:
- Slightly change code flow for initial scan range that
  only parses the file header. The new code sets
  'only_parsing_header_' in Open() and then honors
  that flag in GetNextInternal(). Before, all the logic
  was inside ProcessSpit().
- Replace 'finished_' with 'eos_'.
- Add a RowBatch parameter to various functions.
- Change Close() to free all resources when a nullptr
  RowBatch is passed.

Testing:
- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster

Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Reviewed-on: http://gerrit.cloudera.org:8080/6527
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/scan-node.h
M be/src/util/blocking-queue.h
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
26 files changed, 709 insertions(+), 824 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie18f57b0d3fe0052a8ccd361b6a5fcdf979d0669
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: anujphadke <aphadke@cloudera.com>

Mime
View raw message