impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3905: Add single-threaded scan node.
Date Fri, 26 Aug 2016 21:24:26 GMT
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4137

Change subject: IMPALA-3905: Add single-threaded scan node.
......................................................................

IMPALA-3905: Add single-threaded scan node.

Adds a new single-threaded scan node HdfsScanNodeMt that
materializes tuples in the thread calling GetNext().
The new scan node uses the HdfsScanner::GetNext() interface,
which currently is only implemented for Parquet.
As before, I/O is performed asynchronously via the I/O manager.

The new scan node is enabled if the mt_dop query option is
set to a value greater than 1. Otherwise, the existing
multi-threaded scan node is used.

The changes are mostly a refactoring of the existing multi-threaded
scan node to separate out the common code between the existing
multi-threaded scan node and the new single-threaded one.

Summary of changes:
- Move code from hdfs-scan-node.h/cc into a new hdfs-scan-node-base.h/cc
- Add new single-threaded scan node in hdfs-scan-node-mt.h/cc
- Both scan nodes inherit from HdfsScanNodeBase
- Rework the allocation of templates tuples such that the memory is drawn
  from a new mem pool in the scanners, and that each scanner clones the
  partition exprs contexts. Before, the memory was taken from the parent
  scan node's mem pool, and there was only one instance of the partition
  exprs contexts. Their access was protected under a lock, however, not
  in all instances, so their use was not always obviously correct.
  The change in this patch makes thread safety obvious and helps move
  a lock into the multi-threaded scan node which would otherwise have
  to remain in the HdfsScanNodeBase class.
- Simplify a couple of loops with C++11 for-each

Testing: A private core/hdfs run passed. I ran TPC-H/DS and test_scanners.py
on ASAN several times locally.

Change-Id: I27b26b878886163f2b8312d75c8ad296826a8303
---
M be/src/exec/CMakeLists.txt
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/base-sequence-scanner.h
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-lzo-text-scanner.cc
M be/src/exec/hdfs-lzo-text-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
A be/src/exec/hdfs-scan-node-base.cc
A be/src/exec/hdfs-scan-node-base.h
A be/src/exec/hdfs-scan-node-mt.cc
A be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exprs/expr-context.h
M be/src/runtime/tuple.h
M tests/query_test/test_partitioning.py
30 files changed, 1,793 insertions(+), 1,459 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/4137/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4137
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I27b26b878886163f2b8312d75c8ad296826a8303
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>

Mime
View raw message