impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3798: Disable per-split filtering for sequence-based scanners
Date Tue, 28 Jun 2016 21:09:58 GMT
Henry Robinson has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3524

Change subject: IMPALA-3798: Disable per-split filtering for sequence-based scanners
......................................................................

IMPALA-3798: Disable per-split filtering for sequence-based scanners

If a runtime filter rejects a sequence-based format's header split (but
not the entire file, which may happen if the filter has not arrived in
time), the scanner will never mark all splits for that file
complete. This is because BaseSequenceScanner issues scan ranges after
parsing the header splits, and until those ranges are processed,
RangeComplete() and AddDiskIoRanges() will not be called - those methods
update progress_ and num_files_queued_
respectively. HdfsScanNode::ScannerThread() reads those variables to
decide whether to exit, and as a result will spin forever.

This bug therefore only shows up when there is >1 scan range per file.

This patch disables per-split filtering for Avro, RC and sequence files
in lieu of a permanent fix which marks all scan ranges for a file as
done as soon as one range is filtered out.

Testing:

A custom cluster test is added which disables file filtering, emulating
the race condition that leads to the hang when a query that filters
scan ranges is run. Without the fix, this test hangs, with the fix the
query completes as expected. MAX_SCAN_RANGE_LENGTH is used to ensure >1
scan range per file.

Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
---
M be/src/common/global-flags.cc
M be/src/exec/hdfs-scan-node.cc
A tests/custom_cluster/test_seq_file_filtering.py
3 files changed, 86 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/24/3524/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3524
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Henry Robinson <henry@cloudera.com>

Mime
View raw message