Mailing-List: contact dev-help@impala.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@impala.incubator.apache.org
Message-Id: <201605252232.u4PMWonP030428@ip-10-146-233-104.ec2.internal>
Date: Wed, 25 May 2016 22:32:50 +0000
From: "Dan Hecht (Code Review)" <gerrit@cloudera.org>
To: Tim Armstrong <tarmstrong@cloudera.com>, impala-cr@cloudera.com,
        dev@impala.incubator.apache.org
CC: Matthew Jacobs <mj@cloudera.com>
Reply-To: dhecht@cloudera.com
Subject: =?UTF-8?Q?[Impala-CR](cdh5-trunk)_IMPALA-3344:_Simplify_sorter_and_document/enforce_invariants.=0A?=
In-Reply-To: <gerrit.1461190325000.I9c619e81fd1b8ac50e257172c8bce101a112b52a@gerrit.cloudera.org>
References: <gerrit.1461190325000.I9c619e81fd1b8ac50e257172c8bce101a112b52a@gerrit.cloudera.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
User-Agent: Gerrit/2.10-rc0
archived-at: Wed, 25 May 2016 23:12:40 -0000

Dan Hecht has posted comments on this change.

Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants.
......................................................................


Patch Set 14:

(59 comments)

Nice cleanup.

http://gerrit.cloudera.org:8080/#/c/2826/14/be/src/runtime/sorted-run-merger.cc
File be/src/runtime/sorted-run-merger.cc:

Line 55: It may not succeed in advancing to the next row if
       :   /// the next batch is empty and has attached resources
Is that true? it looks like the loop below will continue to the next batch in this case (attaching resources), so Advance() will always succeed at setting up the next row.


Line 86: one row including current_row() remaining in the
       :   /// batch. 
i'm not sure what this is saying.  also, looks like it's only called in Advance() so how about just inlining it there (or at least make it private)?


http://gerrit.cloudera.org:8080/#/c/2826/14/be/src/runtime/sorted-run-merger.h
File be/src/runtime/sorted-run-merger.h:

Line 68:   void TransferAllResources(RowBatch* transfer_resource_batch);
i don't see any callers or implementation of this.  looks like it should be removed.


http://gerrit.cloudera.org:8080/#/c/2826/14/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 71: If the run is unsorted, it is sorted in-place in
       : /// memory. Once sorted, runs can be spilled to disk if the sorter does not have enough
       : /// memory.
I think this could be clearer if the cause-effect relationship is clearer. Something like:

Initial runs continue to grow as long as more blocks can be allocated. If a new block cannot be allocated, then the run is sorted in-place and spilled to disk.  Then a new initial run is started.


Line 83:  
(for merging if there are multiple runs) ?


Line 83:  
(if there was a single run) ?


Line 89:   ~Run() { DeleteAllBlocks(); }
it would be best to avoid doing buffer management in the destructor. can you move this some place explicit and instead make the destructor dcheck that the run has no blocks?


Line 94: (
why parenthesis?


Line 102: start_index
document


Line 103:     return AddBatch<true>(batch, start_index, num_processed);
It will be DCHECKed inside, but adding this dcheck might clarify tot he reader that this can only be called on initial runs:

DCHECK(initial_run_);


Line 105:   Status AddIntermediateBatch(RowBatch* batch, int start_index, int* num_processed) {
likewise, for documentation purpoes:
DCHECK(!initial_run_);


Line 109: AddBatch
Add*Batch()?


Line 131: .
is this correct: (otherwise, var-length slots already contain pointers rather than offsets) ?


Line 132: when
after? (i.e. it won't be set on the last batch that returns the final rows).


Line 132: read
returned?


Line 133: .
and has no attached blocks.


Line 134: 2
two (to match "one")


Line 136: 1 (2
one (two


Line 137: unpinned (Sorter holds on to the memory).
is this true? don't we transfer the blocks as they are completed?


Line 140:  If we leave the last run to be merged in memory, the fixed-len blocks can be
        :   /// unpinned as they are consumed.
aren't the blocks transferred in this case? so what's left to do?


Line 145: deleted
is that accurate? or should it say "transferred"?


Line 161: AddBatch
this naming (with AddBatchInternal) is a little confusing given that AddBatch() itself is internal (i.e. private). Maybe some other way to distinguish/factor?


Line 231:   const int last_tuple_block_offset_;
do we move between blocks often enough to make precalculating this worthwhile? (probably no given that we do a multiply by capicity_ already in that function).


Line 247:   /// True if the tuples in the run are currently in sorted order.
Always true for intermediate runs.


Line 254: deleted
transferred or deleted?


Line 263: deleted
transferred or deleted?


Line 268: allocated
remove


Line 268: FinalizeInput
why do it during FinalizeInput() rather than just always do it at UnpinAllBlocks() time?


Line 269: UnpinAllBlocks
UnpinBlocks() doesn't literally delete that block, so maybe clarify this a bit.


Line 312: begin
Begin


Line 315: end
End


Line 319: .
also updates 'index_'?


Line 348: invalid index
can this be more specific (does e.g. Prev(Next()) always get you back to where you were or is that not the case if Next() got you to an illegal index)? like is it -1 or #tuples in those cases?


Line 492:   else sorter_->spilled_runs_counter_->Add(1);
do we use one-line syntax if there is an else clause?


Line 622: 
nit: extraneous blank lines


Line 633: Intermediate merge runs are unpinned.
not sure what this is trying to tell me. that intermediate runs are always !is_pinned_, or is it just saying what the next line of code is doing?

In any case, I think more explanation for the DCHECK at line 632 would be good, which could summarize both points I think:

Initial runs will always be pinned at this point (they still need to be sorted, and might be unpinned after sorting). Intermediate runs always have is_unpinned_.

So actually, couldn't the dcheck be stronger. move out of if-stmt and:
DCHECK_EQ(is_initial_, is_pinned_);


Line 651:   DCHECK(is_sorted_);
DCHECK(is_initial_) also?


Line 729:   // and the individual blocks do not need to be pinned.
Is there a more direct way of explaining/checking this?  Is this the case where there was only one run?  maybe from the run the is_pinned_ check is the only way to differentiate this case, but at least let's improve the comment.


Line 775:     // if it did, we would have to return them to the caller.
This comment is confusing. Is it saying that the assumption might not be valid (why else does it tell us what we have to do if the resources are attached)?  Rather than what we assume, let's just say what the invariant is:

Setting output_batch to NULL signals eos to the caller, so GetNext() is not allowed to attach resources to the batch on eos.


Line 781: 
nit: many of these blank lines don't help readability and cause less code to fit on the screen.


Line 812:   } else {
maybe get rid of this else since L813 is really the start of the case at 816.


Line 840:       // The var-len data is in the next block, so end this call to GetNext().
would be good to explain "why" rather than "what". is it so the var length block can be attached/deleted?


Line 853: 
nit: remove blank line


Line 874:         output_batch->AddBlock(var_len_blocks_[var_len_blocks_index_]);
couldn't we have attached this var len block above, and so now we're passing NULL to AddBlock()? Is that okay/intentional?  also, why wouldn't the code above (in the convert offsets path) have always taken care of this on the last block?


Line 883: 
nit: remove blank line


Line 904:   RETURN_IF_ERROR(next_block->Pin(&pinned, curr_block, false));
if there is an error here, is it okay that curr_block is still in *blocks?  Pin() always deletes release_block, right?


Line 1047: tuple
block containing the last tuple?


Line 1050:     index_ = run->num_tuples();
isn't this done already?


Line 1070: Can't move to block past the end.
I was confused by this comment until reading the code because it's referring to a non-existing block.  Maybe say:

// When moving beyond the last tuple, stay at the last block.

or similar


Line 1088:   // Can't move to block before the first.
similarly: When moving before the first tuple, stay at the first block.


Line 1341:   block_mgr_->ClearReservations(block_mgr_client_);
it'd be best to not do all this buffer management work in the destructor, but instead have it explicitly do this at the right point in time.  and then make the destructor just dcheck that the cleanup already happened.


Line 1408:   if (block_mgr_->num_free_buffers() < min_buffers_for_merge - blocks_per_run) {
I don't understand why we shouldn't just always unpin this last run, given that there's more than one run meaning the sum of all runs can't possibly fit in memory (at least given the memory we had while consuming input). The original code wouldn't have thought that there would now be more buffers available (because the BBM was per-node), but maybe that's what we're hoping for here?

Also not doing UnpinAllBlocks() always makes it harder to reason about the var_len_copy_block_ deletion.


Line 1441:     // in the pinned blocks in the single sorted run.
Is this comment still relevant? What is it trying to tell me?


Line 1444:       DCHECK_EQ(output_batch->num_rows(), 0);
what is this DCHECK documenting? why is this important on this path? (also if keeping it, make it one line).


Line 1448:     // resource.
is this comment still relevant given the cleanup to memory management? isn't that up to the merger_? What does it have to do with Sorter resources?


http://gerrit.cloudera.org:8080/#/c/2826/14/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

Line 60: After the input is consumed, the sorter is left with one or more sorted runs. The
       : /// client calls GetNext(output_batch) to retrieve batches of sorted rows. If there are
       : /// multiple runs, the runs are merged using SortedRunMerger to produce a stream of sorted
       : /// tuples. At least one block per run (two if there are var-length slots) must be pinned
       : /// in memory during a merge, so multiple merges may be necessary if the number of runs is
       : /// too large.
not your change but this paragraph is kind of confusing because it talks about GetNext() but then talks about stuff that might have happened earlier (intermediate merges) but isn't very clear about that.


Line 77:  final merge)
again confusing because the comment didn't really explain how merging happens (intermediate vs final). would be good to clarify the algorithm.


Line 137: Finalizes
not sure what it means to "finalize" unsorted_run_ from just reading this header. is there a concise way to explain that (at least the part that relevant to the caller)?


Line 199: unsorted
they become sorted, right? so maybe just say "initial runs"?


-- 
To view, visit http://gerrit.cloudera.org:8080/2826
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Gerrit-PatchSet: 14
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes