impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5776: Write partial tuple to the correct mempool
Date Fri, 18 Aug 2017 19:35:06 GMT
Taras Bobrovytsky has uploaded a new patch set (#6).

Change subject: IMPALA-5776: Write partial tuple to the correct mempool
......................................................................

IMPALA-5776: Write partial tuple to the correct mempool

In the text scanner, we were writing the partial tuple variable length
data to data_buffer_pool_ mempool which caused strange behavior, such
as incorrect results.

If we are scanning compressed data, the pool gets attached to the row
batch at the end of a GetNext() call and gets freed before the next
GetNext() call. This is wrong because we expect the data in the partial
tuple to survive between the GetNext() calls. If we are scanning non
compressed data, data_buffer_pool_ never gets cleared and grows over
time until the scanner finishes reading the scan range.

We fix the problem by writing the varlen partial tuple data to
boundary_pool, which is where the constant length partial tuple data is
written. We also make sure that boundary pool does not hold any tuple
data of returned batches by always deep copying it to output batches.

Testing:
- Ran some tests locally on ASAN build.
- Updated test_scanners_fuzz.py to make slightly more significant
  changes to the data files. This change was helpful for finding
  issues while developing this patch.

Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
---
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M tests/query_test/test_scanners_fuzz.py
3 files changed, 65 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/7639/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>

Mime
View raw message