Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 001EA200BE0 for ; Sat, 3 Dec 2016 03:12:20 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id F2EC6160B27; Sat, 3 Dec 2016 02:12:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 20846160AF6 for ; Sat, 3 Dec 2016 03:12:19 +0100 (CET) Received: (qmail 67646 invoked by uid 500); 3 Dec 2016 02:12:19 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 67619 invoked by uid 99); 3 Dec 2016 02:12:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Dec 2016 02:12:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A728DC0126 for ; Sat, 3 Dec 2016 02:12:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id zF67GyoWqHsJ for ; Sat, 3 Dec 2016 02:12:16 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C77D95FAC8 for ; Sat, 3 Dec 2016 02:12:15 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uB32CEPg022482; Sat, 3 Dec 2016 02:12:14 GMT Message-Id: <201612030212.uB32CEPg022482@ip-10-146-233-104.ec2.internal> Date: Sat, 3 Dec 2016 02:12:14 +0000 From: "Henry Robinson (Code Review)" To: Marcel Kornacker , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Matthew Jacobs , Alex Behm , Lars Volker , Sailesh Mukil Reply-To: henry@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4014=3A_Introduce_query-wide_execution_state=2E=0A?= X-Gerrit-Change-Id: I962ae6b7cb7dc0d07fbb8f70317aeb01d88d400b X-Gerrit-ChangeURL: X-Gerrit-Commit: 73efaae714a2feb42bd99b6cd2090609a3181957 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Sat, 03 Dec 2016 02:12:21 -0000 Henry Robinson has posted comments on this change. Change subject: IMPALA-4014: Introduce query-wide execution state. ...................................................................... Patch Set 6: (11 comments) Some more comments about the refcounting, many in response to yours. http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS6, Line 509: qs = nullptr; superfluous, since it is never read again. http://gerrit.cloudera.org:8080/#/c/4418/4/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS4, Line 86: // start new thread to execute instance : Thread::Run("query-exec-mgr", : Substitute("exec-fragment-instance-$0", PrintId( > i felt both approaches are equally valid. Having some fragment instance management logic here confuses the responsibilities of this class and of QueryState (which I would expect to control the lifecycle of the fragment instance as well as contain its metadata). There naturally seems to be a hierarchy of responsibilities: QueryExecMgr manages QueryStates, QueryStates manage FragmentInstances. That encapsulation makes the code easier to reason about. PS4, Line 115: // always decrement refcount > i thought about this, but didn't see it as a bug. Maybe it's not a bug as implemented, but it's counter-intuitive that there could be more than one QS over time (it's tripped two reviewers up now). It is a simpler invariant that there is one QES over the lifetime of the query. I think there are bugs waiting to happen. For example: we add aggregated state / metrics to the QES and then wonder why they get reset over time. PS4, Line 147: if (it == qs_map_.end()) return; > after decreasing the refcnt to 0 in l138 someone else could have increased I think this is again unnecessarily confusing. Why not have the semantics such that: refcount == 0 -> no more references may be taken && caller to RQS() that sets count == 0 performs (or schedules) the GC. If we stipulate that a query is live for as long as its fragments are running, then having the QES start out with refcount == 1 (like you already do) is compatible with the above invariant, and much easier to reason about. http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS6, Line 76: qs->refcnt_.Load() beware that this can change between line 72 and now. Just store the result of Add() and log that. http://gerrit.cloudera.org:8080/#/c/4418/4/be/src/runtime/query-exec-mgr.h File be/src/runtime/query-exec-mgr.h: PS4, Line 58: > i'll change this comment to fit the existing behavior. Elsewhere you say that you're happy with a QS getting recreated if the original gets GCed. I think that allowing refcount == 0 as a valid live state complicates the state machine for a refcounted object. For now, if the refcount hits 0, it's not deterministic whether or not a QS will get GC'ed. It's more understandable to make that relationship unambiguous, and I've expanded on that in other comments. Who are the clients that you expect to do a second registration? If you're concerned about the case where a fragment completes before another fragment arrives, that seems like an edge case that's not worth adapting the semantics for (given that that case won't be possible once batching happens). Eventually I would expect registration to happen once per query, called from one path only (the batched RPC handler). http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: PS6, Line 89: // Helper function to translate between Beeswax and HiveServer2 type : static TOperationState::type QueryStateToTOperationState( : const beeswax::QueryState::type& query_state) { : switch (query_state) { : case beeswax::QueryState::CREATED: return TOperationState::INITIALIZED_STATE; : case beeswax::QueryState::RUNNING: return TOperationState::RUNNING_STATE; : case beeswax::QueryState::FINISHED: return TOperationState::FINISHED_STATE; : case beeswax::QueryState::EXCEPTION: return TOperationState::ERROR_STATE; : default: return TOperationState::UKNOWN_STATE; : } : } While you're here, put in anonymous namespace. http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/util/thread.cc File be/src/util/thread.cc: PS6, Line 305: //ThreadMgr* thread_mgr = ExecEnv::GetInstance()->thread_mgr(); ? PS6, Line 313: //thread_mgr_ref->AddThread(this_thread::get_id(), name, category, system_tid); ? http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/util/thread.h File be/src/util/thread.h: PS6, Line 146: /// Starts a detached thread running 'functor' and returns immediately. : static void StartDetachedThread(con What's a detached thread? Why doesn't this method register threads so that we can monitor them? PS6, Line 172: SuperviseDetachedThread Needs a comment. At this moment I don't see why this exists. -- To view, visit http://gerrit.cloudera.org:8080/4418 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I962ae6b7cb7dc0d07fbb8f70317aeb01d88d400b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Marcel Kornacker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes