impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-3441, IMPALA-3659: check for malformed Avro data
Date Mon, 13 Jun 2016 15:16:56 GMT
Tim Armstrong has uploaded a new patch set (#18).

Change subject: IMPALA-3441, IMPALA-3659: check for malformed Avro data

IMPALA-3441, IMPALA-3659: check for malformed Avro data

This patch adds error checking to the Avro scanner (both the codegen'd
and interepted paths), including out-of-bounds checks and data
validity checks.

I ran a local benchmark using the following queries:
  set num_scanner_threads=1;
  select count(i) from default.avro_bigints_big; # file contains only longs
  select max(l_orderkey) from biglineitem_avro; # file has tpch.lineitem schema

Both benchmark queries see negligable or no performance impact.

This patch adds a new Avro scanner unit test and an end-to-end test
that queries several corrupted files, as well as updates the zig-zag
varlen int unit test.

Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
M be/src/exec/CMakeLists.txt
M be/src/exec/
M be/src/exec/base-sequence-scanner.h
M be/src/exec/
A be/src/exec/
M be/src/exec/
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/
M be/src/exec/
M be/src/exec/hdfs-scanner.h
M be/src/exec/
M be/src/exec/read-write-util.h
M be/src/exec/
M be/src/exec/scanner-context.h
M be/src/exec/scanner-context.inline.h
M be/src/exec/
M common/thrift/
A testdata/bad_avro_snap/README
A testdata/bad_avro_snap/invalid_union.avro
A testdata/bad_avro_snap/negative_string_len.avro
A testdata/bad_avro_snap/truncated_float.avro
A testdata/bad_avro_snap/truncated_string.avro
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M tests/common/
M tests/data_errors/
27 files changed, 1,134 insertions(+), 233 deletions(-)

  git pull ssh:// refs/changes/72/3072/18
To view, visit
To unsubscribe, visit

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
Gerrit-PatchSet: 18
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Skye Wanderman-Milne <>
Gerrit-Reviewer: Tim Armstrong <>

View raw message