Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DEF62200AF7 for ; Tue, 14 Jun 2016 13:31:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DD90B160A47; Tue, 14 Jun 2016 11:31:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 312C01602C5 for ; Tue, 14 Jun 2016 13:31:43 +0200 (CEST) Received: (qmail 95421 invoked by uid 500); 14 Jun 2016 11:31:42 -0000 Mailing-List: contact dev-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list dev@impala.incubator.apache.org Received: (qmail 95410 invoked by uid 99); 14 Jun 2016 11:31:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2016 11:31:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 952E2C2066 for ; Tue, 14 Jun 2016 11:31:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id YCXitG7H08kD for ; Tue, 14 Jun 2016 11:31:39 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 0227F5F257 for ; Tue, 14 Jun 2016 11:31:38 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u5EBVbkc024178; Tue, 14 Jun 2016 11:31:37 GMT Message-Id: <201606141131.u5EBVbkc024178@ip-10-146-233-104.ec2.internal> Date: Tue, 14 Jun 2016 11:31:28 +0000 From: "Bharath Vissapragada (Code Review)" To: impala-cr@cloudera.com, dev@impala.incubator.apache.org CC: Dan Hecht Reply-To: bharathv@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-CR=5D=28cdh5-trunk=29_IMPALA-3680=3A_Cleanup_the_scan_range_state_after_failed_hdfs_cache_reads=0A?= X-Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a X-Gerrit-ChangeURL: X-Gerrit-Commit: 3b2d61884600f8cda409cd68698965df8fccbc0d In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Tue, 14 Jun 2016 11:31:44 -0000 Bharath Vissapragada has uploaded a new patch set (#4). Change subject: IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads ...................................................................... IMPALA-3680: Cleanup the scan range state after failed hdfs cache reads Currently we don't reset the file read offset if ZCR fails. Due to this, when we switch to the normal read path, we hit the eosr of the scan-range even before reading the expected data length. If both the ReadFromCache() and ReadRange() calls fail without reading any data, we endup creating a whole list of scan-ranges, each with size 1KB (DEFAULT_READ_PAST_SIZE) assuming we are reading past the scan range. This gives a huge performance hit. This patch just calls ScanRange::Close() after the failed cache reads to clean up the file system state so that the re-reads start from beginning of the scan range. This was hit as a part of debugging IMPALA-3679, where the queries on 1gb cached data were running ~20x slower compared to non-cached runs. Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a --- M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl M tests/query_test/test_hdfs_caching.py 4 files changed, 67 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3313/4 -- To view, visit http://gerrit.cloudera.org:8080/3313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0a9ea19dd8571b01d2cd5b87da1c259219f6297a Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht