impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-2.6.0 5.8.0) IMPALA-3798: Disable per-split filtering for sequence-based scanners
Date Wed, 29 Jun 2016 04:54:22 GMT
Henry Robinson has submitted this change and it was merged.

Change subject: IMPALA-3798: Disable per-split filtering for sequence-based scanners
......................................................................


IMPALA-3798: Disable per-split filtering for sequence-based scanners

If a runtime filter rejects a sequence-based format's header split (but
not the entire file, which may happen if the filter has not arrived in
time), the scanner will never mark all splits for that file
complete. This is because BaseSequenceScanner issues scan ranges after
parsing the header splits, and until those ranges are processed,
RangeComplete() and AddDiskIoRanges() will not be called - those methods
update progress_ and num_unqueued_files_
respectively. HdfsScanNode::ScannerThread() reads those variables to
decide whether to exit, and as a result will spin forever.

This bug therefore only shows up when there is >1 scan range per file.

This patch disables per-split filtering for Avro, RC and sequence files
in lieu of a permanent fix which marks all scan ranges for a file as
done as soon as one range is filtered out.

Testing:

A custom cluster test is added which disables file filtering, emulating
the race condition that leads to the hang when a query that filters
scan ranges is run. Without the fix, this test hangs, with the fix the
query completes as expected. MAX_SCAN_RANGE_LENGTH is used to ensure >1
scan range per file.

Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
Reviewed-on: http://gerrit.cloudera.org:8080/3526
Reviewed-by: Dan Hecht <dhecht@cloudera.com>
Tested-by: Henry Robinson <henry@cloudera.com>
---
M be/src/common/global-flags.cc
M be/src/exec/hdfs-scan-node.cc
A tests/custom_cluster/test_seq_file_filtering.py
3 files changed, 86 insertions(+), 10 deletions(-)

Approvals:
  Henry Robinson: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/3526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>

Mime
View raw message