impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Skye Wanderman-Milne (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3441: check for malformed Avro data
Date Thu, 02 Jun 2016 23:18:57 GMT
Skye Wanderman-Milne has posted comments on this change.

Change subject: IMPALA-3441: check for malformed Avro data
......................................................................


Patch Set 5:

(4 comments)

I used perf top with the --perf_map option and ReadZLong() is where most of the time is spent:

  18.47%  impalad [.] _ZN6impala13ReadWriteUtil9ReadZLongEPPhPlS3_
   9.11%  impalad [.] _ZN6snappy13RawUncompressEPNS_6SourceEPc

Unfortunately I need to add even more error checking to this function (IMPALA-3659), so maybe
we should get this in and work on ReadZLong() perf in that patch.

http://gerrit.cloudera.org:8080/#/c/3072/6/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

Line 176:     ReportInvalidValue(TErrorCode::AVRO_INVALID_LENGTH, len);
> UNLIKELY
Done


Line 180:     ReportCorruptData(TErrorCode::AVRO_TRUNCATED_BLOCK);
> UNLIKELY
Done


http://gerrit.cloudera.org:8080/#/c/3072/5/be/src/exec/hdfs-avro-scanner-test.cc
File be/src/exec/hdfs-avro-scanner-test.cc:

Line 190:   // TODO: we don't handle invalid values (e.g. overflow)
> There's an ASSERT_DEBUG_DEATH macro that does some magic to assert that it 
Dan pointed out that we need to fix this one way or another or we still may crash on corrupt
files, so I'm gonna skip this for now since I need to fix it ASAP anyway. I'll add a test
case for it when I fix it.


http://gerrit.cloudera.org:8080/#/c/3072/6/testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
File testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test:

Line 7
> Are you intending to fix this in this patch?
Oops, forgot to address this. I'm having a lot of trouble matching these error messages with
the current test verifier logic, so I commented them out for now and meant to file a JIRA
to fix this.


-- 
To view, visit http://gerrit.cloudera.org:8080/3072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Skye Wanderman-Milne <skye@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message