Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6128E200D04 for ; Mon, 11 Sep 2017 23:44:38 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5FE8A1609C4; Mon, 11 Sep 2017 21:44:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A65A41609B7 for ; Mon, 11 Sep 2017 23:44:37 +0200 (CEST) Received: (qmail 95747 invoked by uid 500); 11 Sep 2017 21:44:35 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 95735 invoked by uid 99); 11 Sep 2017 21:44:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Sep 2017 21:44:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id DD6A3188C5E for ; Mon, 11 Sep 2017 21:44:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id UfSjtJ00P7BG for ; Mon, 11 Sep 2017 21:44:33 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A857C5FD48 for ; Mon, 11 Sep 2017 21:44:33 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v8BLiWo3029615; Mon, 11 Sep 2017 21:44:32 GMT Message-Id: <201709112144.v8BLiWo3029615@ip-10-146-233-104.ec2.internal> Date: Mon, 11 Sep 2017 21:44:32 +0000 From: "Lars Volker (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Joe McDonnell , Tim Armstrong , Dan Hecht Reply-To: lv@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-5890=3A_Abort_queries_if_scanner_hits_IO_errors=0A?= X-Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 X-Gerrit-ChangeURL: X-Gerrit-Commit: 546d83b3ff6d618e6a39c1743a534559355c3bb4 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Mon, 11 Sep 2017 21:44:38 -0000 Lars Volker has uploaded a new patch set (#6). Change subject: IMPALA-5890: Abort queries if scanner hits IO errors ...................................................................... IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we abort a scan range if it encounters an I/O error. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/runtime-state.cc M common/thrift/generate_error_codes.py 7 files changed, 66 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/6 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong