impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.
Date Thu, 23 Feb 2017 00:30:13 GMT
Alex Behm has uploaded a new patch set (#3).

Change subject: IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.

IMPALA-3905: Implements HdfsScanner::GetNext() for text scans.

Implements HdfsLzoTextTextScanner::GetNext() and changes
ProcessSplit() to repeatedly call GetNext() to share the core
scanning code between the legacy ProcessSplit() interface
(ProcessSpit()) and the new GetNext() interface.

These changes were tricky:
- The scanner used to rely on the ability to attach a batch
  to the row-batch queue for freeing resources
- This patch attempts to preserve the resource-freeing behavior
  by clearing resources as soon as they are complete
- In particular, the scanner attempts to skip corrupt/invalid
  data blocks, and we should avoid accumulating memory

The other changes are mostly straightforward:
- Add a RowBatch parameter to various functions
- Add a MemPool parameter to various functions for attaching
  memory of completed resources that may still be references
  by returned batches
- Change Close() to free all resources when a nullptr
  RowBatch is passed

- Exhaustive tests passed on debug
- Core tests passed on asan
- TODO: Perf testing on cluster

Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
M be/src/exec/
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/
M be/src/exec/
M be/src/exec/hdfs-scan-node.h
M be/src/exec/
M be/src/exec/
M be/src/exec/hdfs-scanner.h
M be/src/exec/
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/
M be/src/exec/scanner-context.h
M fe/src/main/java/org/apache/impala/planner/
15 files changed, 292 insertions(+), 215 deletions(-)

  git pull ssh:// refs/changes/00/6000/3
To view, visit
To unsubscribe, visit

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id193aa223434d7cc40061a42f81bbb29dcd0404b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Marcel Kornacker <>

View raw message