Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 855E4200B0F for ; Fri, 3 Jun 2016 01:19:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 83DAF160A52; Thu, 2 Jun 2016 23:19:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CE850160A51 for ; Fri, 3 Jun 2016 01:19:03 +0200 (CEST) Received: (qmail 26011 invoked by uid 500); 2 Jun 2016 23:19:03 -0000 Mailing-List: contact dev-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list dev@impala.incubator.apache.org Received: (qmail 26000 invoked by uid 99); 2 Jun 2016 23:19:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Jun 2016 23:19:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 506E7C06CD for ; Thu, 2 Jun 2016 23:19:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id dCJcnWzdbNeb for ; Thu, 2 Jun 2016 23:19:00 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E36A85F240 for ; Thu, 2 Jun 2016 23:18:59 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u52NIwXE015148; Thu, 2 Jun 2016 23:18:58 GMT Message-Id: <201606022318.u52NIwXE015148@ip-10-146-233-104.ec2.internal> Date: Thu, 2 Jun 2016 23:18:57 +0000 From: "Skye Wanderman-Milne (Code Review)" To: Alex Behm , impala-cr@cloudera.com, dev@impala.incubator.apache.org CC: Dan Hecht , Tim Armstrong Reply-To: skye@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-CR=5D=28cdh5-trunk=29_IMPALA-3441=3A_check_for_malformed_Avro_data=0A?= X-Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 X-Gerrit-ChangeURL: X-Gerrit-Commit: 4ffacfcd8cd74140f98704a410c2ebd183f5e871 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Thu, 02 Jun 2016 23:19:04 -0000 Skye Wanderman-Milne has posted comments on this change. Change subject: IMPALA-3441: check for malformed Avro data ...................................................................... Patch Set 5: (4 comments) I used perf top with the --perf_map option and ReadZLong() is where most of the time is spent: 18.47% impalad [.] _ZN6impala13ReadWriteUtil9ReadZLongEPPhPlS3_ 9.11% impalad [.] _ZN6snappy13RawUncompressEPNS_6SourceEPc Unfortunately I need to add even more error checking to this function (IMPALA-3659), so maybe we should get this in and work on ReadZLong() perf in that patch. http://gerrit.cloudera.org:8080/#/c/3072/6/be/src/exec/hdfs-avro-scanner-ir.cc File be/src/exec/hdfs-avro-scanner-ir.cc: Line 176: ReportInvalidValue(TErrorCode::AVRO_INVALID_LENGTH, len); > UNLIKELY Done Line 180: ReportCorruptData(TErrorCode::AVRO_TRUNCATED_BLOCK); > UNLIKELY Done http://gerrit.cloudera.org:8080/#/c/3072/5/be/src/exec/hdfs-avro-scanner-test.cc File be/src/exec/hdfs-avro-scanner-test.cc: Line 190: // TODO: we don't handle invalid values (e.g. overflow) > There's an ASSERT_DEBUG_DEATH macro that does some magic to assert that it Dan pointed out that we need to fix this one way or another or we still may crash on corrupt files, so I'm gonna skip this for now since I need to fix it ASAP anyway. I'll add a test case for it when I fix it. http://gerrit.cloudera.org:8080/#/c/3072/6/testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test File testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test: Line 7 > Are you intending to fix this in this patch? Oops, forgot to address this. I'm having a lot of trouble matching these error messages with the current test verifier logic, so I commented them out for now and meant to file a JIRA to fix this. -- To view, visit http://gerrit.cloudera.org:8080/3072 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Skye Wanderman-Milne Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes