impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-3764: fuzz test HDFS scanners
Date Tue, 19 Jul 2016 20:57:56 GMT
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-3764: fuzz test HDFS scanners

IMPALA-3764: fuzz test HDFS scanners

This adds a test that performs some simple fuzz testing of HDFS
scanners. It creates a copy of a given HDFS table, with each
file in the table corrupted in a random way: either a single
byte is set to a random value, or the file is truncated to a
random length. It then runs a query that scans the whole table
with several different batch_size settings. I made some effort
to make the failures reproducible by explicitly seeding the
random number generator, and providing a mechanism to override
the seed.

The fuzzer has found crashes resulting from corrupted or truncated
input files for RCFile, SequenceFile, Parquet, and Text LZO so far.
Avro only had a small buffer read overrun detected by ASAN.

Includes fixes for Parquet crashes found by the fuzzer and a
small buffer overrun in Avro.

Initially it is only enabled for Avro, Parquet, and uncompressed
text. As follow-up work we should fix the bugs in the other scanners
and enable the test for them.

We also don't implement abort_on_error=0 correctly in Parquet:
for some file formats, corrupt headers result in the query being
aborted, so an exception will xfail the test.

Ran the test with exploration_strategy=exhaustive in a loop locally
with both DEBUG and ASAN builds. It's been running successfully for
a few hours now without hitting any crashes or test failures after
I made the fixes included in the patch. Also ran exhaustive private
build and core ASAN build.

Change-Id: I50cf43195a7c582caa02c85ae400ea2256fa3a3b
M be/src/exec/
M be/src/exec/
M be/src/exec/
M be/src/exec/parquet-column-readers.h
M be/src/exec/
M be/src/exec/parquet-metadata-utils.h
M be/src/runtime/
A be/src/runtime/scoped-buffer.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/
M be/src/util/dict-encoding.h
M be/src/util/
M be/src/util/rle-encoding.h
M be/src/util/
M testdata/workloads/functional-query/queries/QueryTest/parquet.test
M tests/common/
M tests/query_test/
18 files changed, 362 insertions(+), 44 deletions(-)

  git pull ssh:// refs/changes/48/3448/5
To view, visit
To unsubscribe, visit

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50cf43195a7c582caa02c85ae400ea2256fa3a3b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Taras Bobrovytsky <>
Gerrit-Reviewer: Tim Armstrong <>

View raw message