Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C77A9200BEF for ; Wed, 21 Dec 2016 01:16:31 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C5FA9160B33; Wed, 21 Dec 2016 00:16:31 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EB097160B29 for ; Wed, 21 Dec 2016 01:16:30 +0100 (CET) Received: (qmail 16522 invoked by uid 500); 21 Dec 2016 00:16:30 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 16507 invoked by uid 99); 21 Dec 2016 00:16:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2016 00:16:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 85CD11A00E4 for ; Wed, 21 Dec 2016 00:16:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id R8IbUukyZXva for ; Wed, 21 Dec 2016 00:16:27 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 570AD5F342 for ; Wed, 21 Dec 2016 00:16:27 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uBL0GLk1026532; Wed, 21 Dec 2016 00:16:21 GMT Message-Id: <201612210016.uBL0GLk1026532@ip-10-146-233-104.ec2.internal> Date: Wed, 21 Dec 2016 00:16:21 +0000 From: "Dan Hecht (Code Review)" To: Tim Armstrong , impala-cr@cloudera.com, reviews@impala.incubator.apache.org Reply-To: dhecht@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-3202=2CIMPALA-2079=3A_rework_scratch_file_I/O=0A?= X-Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 X-Gerrit-ChangeURL: X-Gerrit-Commit: caf21f1dc8d6c8926fde5ba1dcc8fd6aed451ebc In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Wed, 21 Dec 2016 00:16:32 -0000 Dan Hecht has posted comments on this change. Change subject: IMPALA-3202,IMPALA-2079: rework scratch file I/O ...................................................................... Patch Set 20: (24 comments) http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/buffered-block-mgr.cc File be/src/runtime/buffered-block-mgr.cc: Line 633: // The block's buffer is still in memory with the original data. ... but we couldn't get additional reservation. However, we can use 'release_block's reservation, so reclaim the block's buffer. or similar Line 656: // FindBufferForBlock() wasn't able to find a buffer so transfer the one from 'release_block'. Line 845: if (block->write_handle_ != NULL) { how could write_handle_ be NULL here? anyway, okay with leaving this if-stmt alone for this legacy code. http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 363: int64_t GetNumWritesOutstanding(); since the test class is a friend, can this be made private? PS20, Line 403: pinned state but this method doesn't actually set the pin bit on the block... isn't this method really just undoing any side effects of an in-flight write? so maybe CancelInFlightWrite() or CancelWrite() or RestoreBufferContents() or something? PS20, Line 458: or double or http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: PS20, Line 1167: with will Line 1173: write_range->file_, errno, GetStrErrMsg()))); weird line break http://gerrit.cloudera.org:8080/#/c/5141/16/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: Line 83: > We could, but then we'd have to maintain a sparse map of 'disk_id' -> 'devi is each explained somewhere? so disk_id_ is a disk-io-mgr concept, while DeviceId is an internal to tmp-file-mgr concept to get a dense numbering of scratch disks, right? let's make sure that's clear somewhere (maybe it is already in the main header). http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/tmp-file-mgr-internal.h File be/src/runtime/tmp-file-mgr-internal.h: Line 41: /// The physical file is created on the first call to AllocateSpace(). is that true? PS20, Line 43: on Line 93: /// blacklisted, the corresponding device will always be blacklisted. hmm i don't see how a device is blacklisted. let's add some verification debug code to make sure these stay in sync. http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/tmp-file-mgr.cc File be/src/runtime/tmp-file-mgr.cc: Line 404: // Don't hold 'lock_' outside AllocateSpace() to minimise contention. not sure what this comment is telling me. PS20, Line 499: Log the error before retrying so that the failure isn't swallowed silently. is this referring to the old LogError() code, or the scratch_errors_? Line 507: handle->file_->ReportIOError(write_status.msg()); this does the file-level blacklist. But i don't see where the device-level blacklist happens... http://gerrit.cloudera.org:8080/#/c/5141/19/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: Line 330: /// so no locking is required. This is a terminal lock and should not be held while > I think it's an orthogonal issue - which is that all of error logging is do There's nothing really about this class that makes it need to have query scope either, right? anyway, great that we can remove it -- that's the best solution. http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/runtime/tmp-file-mgr.h File be/src/runtime/tmp-file-mgr.h: PS20, Line 40: local delete, or say (usually local) or similar PS20, Line 48: Temporary files are managed via a FileGroup. To write to a temporary file, first a : /// FileGroup is created, then FileGroup::Write() is called to asynchronously write a : /// memory buffer to disk. This still kinda makes it sound like the user of this class can care about the file itself. How about saying: FileGroups manage temporary file space across the multiple devices. To write to the temporary file space, first a FileGroup is created, then ... PS20, Line 78: It is used as a handle for external classes to identify devices. if this is only used externally for tests, I think we should say something different. it sounds like the value of this is mostly internally to have a dense numbering of scratch devices. PS20, Line 167: w handle->write_in_flight_ ? but those things happen in Handle::SetWriteComplete() so maybe reword. Line 276: void Cancel(); does this need to be exposed, or could all callers just use CancelWriteAndRestoreData()? This isn't included as part of the lifecycle. PS20, Line 300: SetWriteComplete "Set" makes it sound like it just sets state. Maybe just call this WriteComplete() too. http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/util/buffer-ref.h File be/src/util/buffer-ref.h: PS20, Line 27: Ref what does "Ref" stand for? Reference? This might get confused with BufferHandle, which is really the reference to a (buffer pool) buffer. This is more like a memory range. Maybe "BufferRange" or "BufRange" or MemRange? And how about using this in SubAllocation? (can be another commit) http://gerrit.cloudera.org:8080/#/c/5141/20/be/src/util/disk-info.h File be/src/util/disk-info.h: Line 91: static int num_datanode_dirs_; dead code also? -- To view, visit http://gerrit.cloudera.org:8080/5141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c9c587df006d2f09d72dd636adafbd295fcdc17 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes