Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D0DE6200B6B for ; Thu, 25 Aug 2016 12:20:41 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CF685160ABD; Thu, 25 Aug 2016 10:20:41 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 23EC5160A93 for ; Thu, 25 Aug 2016 12:20:40 +0200 (CEST) Received: (qmail 30090 invoked by uid 500); 25 Aug 2016 10:20:40 -0000 Mailing-List: contact dev-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list dev@impala.incubator.apache.org Received: (qmail 30074 invoked by uid 99); 25 Aug 2016 10:20:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Aug 2016 10:20:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 7AF5518028E for ; Thu, 25 Aug 2016 10:20:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ksf8m-72GKjh for ; Thu, 25 Aug 2016 10:20:37 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 1C9B05F246 for ; Thu, 25 Aug 2016 10:20:36 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u7PAKakX016355; Thu, 25 Aug 2016 10:20:36 GMT Message-Id: <201608251020.u7PAKakX016355@ip-10-146-233-104.ec2.internal> Date: Thu, 25 Aug 2016 10:20:36 +0000 From: "Internal Jenkins (Code Review)" To: Henry Robinson , impala-cr@cloudera.com, dev@impala.incubator.apache.org X-Gerrit-MessageType: merged Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-=283895=2C3859=29=3A_Don=27t_log_file_data_on_parse_errors=0A?= X-Gerrit-Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 X-Gerrit-ChangeURL: X-Gerrit-Commit: 34b5f1c416148f95a34324d66c1ebbf9585d1845 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Thu, 25 Aug 2016 10:20:42 -0000 Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-(3895,3859): Don't log file data on parse errors ...................................................................... IMPALA-(3895,3859): Don't log file data on parse errors Logging file or table data is a bad idea, and doing it by default is particularly bad. This patch changes HdfsScanNode::LogRowParseError() to log a file and offset only. Testing: See rewritten tests. To support testing this change, we also fix IMPALA-3895, by introducing a canonical string __HDFS_FILENAME__ that all Hadoop filenames in the ERROR output are replaced with before comparing with the expected results. This fixes a number of issues with the old way of matching filenames which purported to be a regex, but really wasn't. In particular, we can now match the rest of an ERROR line after the filename, which was not possible before. In some cases, we don't want to substitute filenames because the ERROR output is looking for a very specific output. In that case we can write: $NAMENODE/ and this patch will not perform _any_ filename substitutions on ERROR sections that contain the $NAMENODE string. Finally, this patch fixes a bug where a test that had an ERRORS section but no RESULTS section would silently pass without testing anything. Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 Reviewed-on: http://gerrit.cloudera.org:8080/4020 Reviewed-by: Henry Robinson Tested-by: Internal Jenkins --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hbase-scan-node-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-rcfile-scan-node-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-sequence-scan-errors.test M testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test M testdata/workloads/functional-query/queries/QueryTest/strict-mode-abort.test M testdata/workloads/functional-query/queries/QueryTest/strict-mode.test M tests/common/impala_test_suite.py M tests/common/test_result_verifier.py M tests/util/filesystem_utils.py M tests/util/hdfs_util.py 19 files changed, 393 insertions(+), 407 deletions(-) Approvals: Henry Robinson: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Tim Armstrong