Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 889FA200CD1 for ; Wed, 12 Jul 2017 08:24:01 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 87252167FE9; Wed, 12 Jul 2017 06:24:01 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A5DEA167FE0 for ; Wed, 12 Jul 2017 08:24:00 +0200 (CEST) Received: (qmail 82287 invoked by uid 500); 12 Jul 2017 06:23:59 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 82265 invoked by uid 99); 12 Jul 2017 06:23:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jul 2017 06:23:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1825C1960E5 for ; Wed, 12 Jul 2017 06:23:59 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id OokdgdslcRkM for ; Wed, 12 Jul 2017 06:23:57 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id F0D1C62639 for ; Wed, 12 Jul 2017 06:15:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v6C6FCMA002801; Wed, 12 Jul 2017 06:15:12 GMT Message-Id: <201707120615.v6C6FCMA002801@ip-10-146-233-104.ec2.internal> Date: Wed, 12 Jul 2017 06:15:12 +0000 From: "Tim Armstrong (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Dan Hecht Reply-To: tarmstrong@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4674=3A_Part_2=3A_port_backend_exec_to_BufferPool=0A?= X-Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e X-Gerrit-ChangeURL: X-Gerrit-Commit: 812a3186f08234561512bb53e12ad643bf97d37c In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Wed, 12 Jul 2017 06:24:01 -0000 Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool ...................................................................... Patch Set 26: (18 comments) Addressed most of the comments but had a couple of questions about query and command-line option changes. http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/exec/exec-node.h File be/src/exec/exec-node.h: Line 236: Status InitBufferPoolClient(RuntimeState* state); > the fact that this claims the initial reservation is important but not obvi ClaimBufferReservation()? http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/exec/hash-table.h File be/src/exec/hash-table.h: PS27, Line 690: success > should that be got_memory too? Done http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: Line 277: DCHECK(got_buffer); > << "Accounted in initial reservation" or similar to explain why. Done http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 280: DCHECK(got_read_buffer); > maybe: << "Accounted in the initial reservation"; Done here and in the PAgg. Also included the buffer pool client debug stream, which may be useful for debugging. Line 833: DCHECK(got_buffer); > same Done Line 850: // TODO: we shouldn't start with this unpinned if spilling is disabled. > what's the consequence of that? it doesn't cause a dcheck now that we disal It does cause a DCHECK given the right query and options. https://gerrit.cloudera.org/#/c/7367/ adds tests and fixes the problem. Line 854: DCHECK(got_buffer); > same Done http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS27, Line 127: Star > QueryState::Init() Done http://gerrit.cloudera.org:8080/#/c/5801/26/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS26, Line 149: if (mem_limit == -1) mem_limit = numeric_limits::max(); > do we allow you to run without a process limit? I thought we pick one for Ah good point. It looks like we do allow you to start up Impala without a process limit e.g. by setting --mem_limit='' This patch actually breaks that "feature" since it sets up a buffer pool with 0 limit in that case. What do you think about just disallowing that? It doesn't seem to be documented anywhere aside from the code. Line 152: if (query_options().__isset.max_block_mgr_memory > should we rename this or do you think it's important to leave it alone for It doesn't look like we document it: http://impala.apache.org/docs/build/html/topics/impala_query_options.html (although strangely there's an xml file in the docs/ folder - I guess it was never added to the list). And I don't expect people have been using it, aside from for testing What about something like "buffer_reservation_limit"? PS26, Line 157: mem_limit - RESERVATION_MEM_MIN_REMAINING) > do we have tests for when this path is chosen? I don't think we have any tests that confirm that queries execute successfully as a result of this path being chosen. This heuristic was just carried over from the old code. Do you think that would be useful to test? I can try to come up with something. Line 378: "Could not free memory by spilling to disk: spilling was disabled by planner"); > is there something actionable for the user that we could explain? Added a reference to the query option. http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: PS27, Line 89: ReturnAllocation > why not Free()? Or AllocateBytes()/FreeBytes() to distinguish from the ent Makes sense. I had just preserved the old names but this seems better. Line 111: int num_rows = 0; Dead variable. Line 119: uint8_t* data = nullptr; > do we expect that if this field is accessed directly outside this struct, t Reworked the Page abstraction so it's a proper class and moved some of the helpers from Sorter into Sorter::Page PS27, Line 637: INTERNAL_ERROR > is this really an INTERNAL_ERROR (i.e. bug)? Or is this path taken when str Good point. BTSV2 uses the MAX_ROW_SIZE error code, so I should use that here too. That has a placeholder to mention the query option. I also realised we don't validate the fixed-length part fits in a page. Added a check to Prepare(). I think now if the fixed-length part doesn't fit we'll spin in this function until we run out of memory. http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/runtime/sorter.h File be/src/runtime/sorter.h: PS27, Line 71: pinned page > nit: "pinned pages" or "a pinned page" Done PS27, Line 188: the > nit Done -- To view, visit http://gerrit.cloudera.org:8080/5801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e Gerrit-PatchSet: 26 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes