Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3EC98200CF3 for ; Wed, 13 Sep 2017 11:21:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3D3401609CB; Wed, 13 Sep 2017 09:21:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 82D361609CA for ; Wed, 13 Sep 2017 11:21:00 +0200 (CEST) Received: (qmail 74426 invoked by uid 500); 13 Sep 2017 09:20:59 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 74412 invoked by uid 99); 13 Sep 2017 09:20:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Sep 2017 09:20:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E20011857C9 for ; Wed, 13 Sep 2017 09:20:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id DtSCzD3djDs9 for ; Wed, 13 Sep 2017 09:20:57 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E381B5F6C8 for ; Wed, 13 Sep 2017 09:20:56 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v8D9KsND009378; Wed, 13 Sep 2017 09:20:54 GMT Message-Id: <201709130920.v8D9KsND009378@ip-10-146-233-104.ec2.internal> Date: Wed, 13 Sep 2017 09:20:54 +0000 From: "Impala Public Jenkins (Code Review)" To: Lars Volker , impala-cr@cloudera.com, reviews@impala.incubator.apache.org X-Gerrit-MessageType: merged Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5890=3A_Abort_queries_if_scanner_hits_IO_errors=0A?= X-Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 X-Gerrit-ChangeURL: X-Gerrit-Commit: 322e2dc80259cfa712fc6d0d224d2c2c16a6708d In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Wed, 13 Sep 2017 09:21:01 -0000 Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors ...................................................................... IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we change the I/O manager to always return DISK_IO_ERROR for errors and we abort a query if we receive such an error in the scanner. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Reviewed-on: http://gerrit.cloudera.org:8080/8011 Reviewed-by: Dan Hecht Tested-by: Impala Public Jenkins --- M be/src/common/status.h M be/src/exec/base-sequence-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/runtime-state.cc M common/thrift/generate_error_codes.py 9 files changed, 104 insertions(+), 55 deletions(-) Approvals: Impala Public Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong