impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4674: Part 2: port backend exec to BufferPool
Date Wed, 12 Jul 2017 06:15:12 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4674: Part 2: port backend exec to BufferPool
......................................................................


Patch Set 26:

(18 comments)

Addressed most of the comments but had a couple of questions about query and command-line
option changes.

http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

Line 236:   Status InitBufferPoolClient(RuntimeState* state);
> the fact that this claims the initial reservation is important but not obvi
ClaimBufferReservation()?


http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

PS27, Line 690: success
> should that be got_memory too?
Done


http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

Line 277:   DCHECK(got_buffer);
> << "Accounted in initial reservation" or similar to explain why.
Done


http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 280:   DCHECK(got_read_buffer);
> maybe: << "Accounted in the initial reservation";
Done here and in the PAgg. Also included the buffer pool client debug stream, which may be
useful for debugging.


Line 833:   DCHECK(got_buffer);
> same
Done


Line 850:   // TODO: we shouldn't start with this unpinned if spilling is disabled.
> what's the consequence of that? it doesn't cause a dcheck now that we disal
It does cause a DCHECK given the right query and options. https://gerrit.cloudera.org/#/c/7367/
adds tests and fixes the problem.


Line 854:   DCHECK(got_buffer);
> same
Done


http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS27, Line 127: Star
> QueryState::Init()
Done


http://gerrit.cloudera.org:8080/#/c/5801/26/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS26, Line 149: if (mem_limit == -1) mem_limit = numeric_limits<int64_t>::max();
> do we allow you to run without a process limit?  I thought we pick one for 
Ah good point. It looks like we do allow you to start up Impala without a process limit e.g.
by setting --mem_limit=''

This patch actually breaks that "feature" since it sets up a buffer pool with 0 limit in that
case.

What do you think about just disallowing that? It doesn't seem to be documented anywhere aside
from the code.


Line 152:   if (query_options().__isset.max_block_mgr_memory
> should we rename this or do you think it's important to leave it alone for 
It doesn't look like we document it: http://impala.apache.org/docs/build/html/topics/impala_query_options.html
(although strangely there's an xml file in the docs/ folder - I guess it was never added to
the list).

And I don't expect people have been using it, aside from for testing

What about something like "buffer_reservation_limit"?


PS26, Line 157: mem_limit - RESERVATION_MEM_MIN_REMAINING)
> do we have tests for when this path is chosen?
I don't think we have any tests that confirm that queries execute successfully as a result
of this path being chosen. This heuristic was just carried over from the old code. Do you
think that would be useful to test? I can try to come up with something.


Line 378:         "Could not free memory by spilling to disk: spilling was disabled by planner");
> is there something actionable for the user that we could explain?
Added a reference to the query option.


http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

PS27, Line 89: ReturnAllocation
> why not Free()?  Or AllocateBytes()/FreeBytes() to distinguish from the ent
Makes sense. I had just preserved the old names but this seems better.


Line 111:   int num_rows = 0;
Dead variable.


Line 119:   uint8_t* data = nullptr;
> do we expect that if this field is accessed directly outside this struct, t
Reworked the Page abstraction so it's a proper class and moved some of the helpers from Sorter
into Sorter::Page


PS27, Line 637: INTERNAL_ERROR
> is this really an INTERNAL_ERROR (i.e. bug)? Or is this path taken when str
Good point. BTSV2 uses the MAX_ROW_SIZE error code, so I should use that here too. That has
a placeholder to mention the query option.

 I also realised we don't validate the fixed-length part fits in a page. Added a check to
Prepare(). I think now if the fixed-length part doesn't fit we'll spin in this function until
we run out of memory.


http://gerrit.cloudera.org:8080/#/c/5801/27/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

PS27, Line 71: pinned page
> nit: "pinned pages" or "a pinned page"
Done


PS27, Line 188: the
> nit
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7fc7fe1c04e9dfb1a0c749fb56a5e0f2bf9c6c3e
Gerrit-PatchSet: 26
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message