Mailing-List: contact dev-help@impala.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@impala.incubator.apache.org
Message-Id: <201605272144.u4RLi3UA011211@ip-10-146-233-104.ec2.internal>
Date: Fri, 27 May 2016 21:44:03 +0000
From: "Tim Armstrong (Code Review)" <gerrit@cloudera.org>
To: impala-cr@cloudera.com, dev@impala.incubator.apache.org
CC: Dan Hecht <dhecht@cloudera.com>, Matthew Jacobs <mj@cloudera.com>
Reply-To: tarmstrong@cloudera.com
Subject: =?UTF-8?Q?[Impala-CR](cdh5-trunk)_IMPALA-3344:_Simplify_sorter_and_document/enforce_invariants.=0A?=
In-Reply-To: <gerrit.1461190325000.I9c619e81fd1b8ac50e257172c8bce101a112b52a@gerrit.cloudera.org>
References: <gerrit.1461190325000.I9c619e81fd1b8ac50e257172c8bce101a112b52a@gerrit.cloudera.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
User-Agent: Gerrit/2.10-rc0
archived-at: Fri, 27 May 2016 21:44:08 -0000

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants.
......................................................................


Patch Set 14:

(64 comments)

http://gerrit.cloudera.org:8080/#/c/2826/14/be/src/runtime/sorted-run-merger.cc
File be/src/runtime/sorted-run-merger.cc:

Line 55: It may not succeed in advancing to the next row if
       :   /// the next batch is empty and has attached resources
> Is that true? it looks like the loop below will continue to the next batch 
Old comment. Deleted.


Line 86: one row including current_row() remaining in the
       :   /// batch. 
> i'm not sure what this is saying.  also, looks like it's only called in Adv
Done


http://gerrit.cloudera.org:8080/#/c/2826/14/be/src/runtime/sorted-run-merger.h
File be/src/runtime/sorted-run-merger.h:

Line 68:   void TransferAllResources(RowBatch* transfer_resource_batch);
> i don't see any callers or implementation of this.  looks like it should be
Done


http://gerrit.cloudera.org:8080/#/c/2826/14/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 71: If the run is unsorted, it is sorted in-place in
       : /// memory. Once sorted, runs can be spilled to disk if the sorter does not have enough
       : /// memory.
> I think this could be clearer if the cause-effect relationship is clearer. 
I think that's starting to describe the algorithm implemented by the sorter, rather than the Run abstraction. If the sorter wanted to build shorter in-memory runs then merge them, then I don't think we'd change the Run abstraction.


Line 83:  
> (if there was a single run) ?
Done


Line 83:  
> (for merging if there are multiple runs) ?
Done


Line 89:   ~Run() { DeleteAllBlocks(); }
> it would be best to avoid doing buffer management in the destructor. can yo
Done


Line 94: (
> why parenthesis?
Removed them.


Line 102: start_index
> document
Done


Line 103:     return AddBatch<true>(batch, start_index, num_processed);
> It will be DCHECKed inside, but adding this dcheck might clarify tot he rea
Done


Line 105:   Status AddIntermediateBatch(RowBatch* batch, int start_index, int* num_processed) {
> likewise, for documentation purpoes:
Done


Line 109: AddBatch
> Add*Batch()?
Done


Line 131: .
> is this correct: (otherwise, var-length slots already contain pointers rath
Yeah, if the run stays in-memory we don't ever convert them.


Line 132: when
> after? (i.e. it won't be set on the last batch that returns the final rows)
Done


Line 132: read
> returned?
Done


Line 133: .
> and has no attached blocks.
Done


Line 134: 2
> two (to match "one")
Done


Line 136: 1 (2
> one (two
Done


Line 137: unpinned (Sorter holds on to the memory).
> is this true? don't we transfer the blocks as they are completed?
Yeah that was wrong. Reworded.


Line 140:  If we leave the last run to be merged in memory, the fixed-len blocks can be
        :   /// unpinned as they are consumed.
> aren't the blocks transferred in this case? so what's left to do?
I should have deleted this comment.


Line 145: deleted
> is that accurate? or should it say "transferred"?
Done


Line 161: AddBatch
> this naming (with AddBatchInternal) is a little confusing given that AddBat
I moved if() into AddInputBatch()/AddIntermediateBatch().


Line 231:   const int last_tuple_block_offset_;
> do we move between blocks often enough to make precalculating this worthwhi
Done.


Line 247:   /// True if the tuples in the run are currently in sorted order.
> Always true for intermediate runs.
Done


Line 254: deleted
> transferred or deleted?
Done


Line 263: deleted
> transferred or deleted?
Done


Line 268: allocated
> remove
Done


Line 268: FinalizeInput
> why do it during FinalizeInput() rather than just always do it at UnpinAllB
If it's a single in-memory run, we won't call UnpinAllBlocks(). I guess we still keep var_len_copy_block_ around for in-memory runs with var-len data, so maybe I'll just remove this special-case optimization.

It would be nice to delete var_len_copy_block_ earlier in the cases when we won't needed it, but it's probably simpler to just delete it when the run is finished.

Left a TODO


Line 269: UnpinAllBlocks
> UnpinBlocks() doesn't literally delete that block, so maybe clarify this a 
Removed the code it was explaining.


Line 312: begin
> Begin
Done


Line 315: end
> End
Done


Line 319: .
> also updates 'index_'?
Done


Line 348: invalid index
> can this be more specific (does e.g. Prev(Next()) always get you back to wh
-1 and run->num_rows() are the only valid out of range values, so I think that makes it clearer combined with the Next()/Prev() comments.


Line 492:   else sorter_->spilled_runs_counter_->Add(1);
> do we use one-line syntax if there is an else clause?
I'm not sure. Added {} to avoid it being too quirky.


Line 622: 
> nit: extraneous blank lines
Done


Line 633: Intermediate merge runs are unpinned.
> not sure what this is trying to tell me. that intermediate runs are always 
I think the explanation is simpler: the run is unpinned, so all blocks in the run should be unpinned now that we're done writing them. Moved the DCHECK and updated the comment.


Line 651:   DCHECK(is_sorted_);
> DCHECK(is_initial_) also?
Done


Line 664:   if (HasVarLenBlocks()) {
The var_len_copy_block_ change (not deleting in FinalizeInput()) required a change here, since it broke the assumption that !HasVarLenBlocks() implies no copy block.


Line 729:   // and the individual blocks do not need to be pinned.
> Is there a more direct way of explaining/checking this?  Is this the case w
Yes, I think the point is that the blocks are already all pinned.

The comment was actually wrong about buffered_batch, so i fixed that.


Line 775:     // if it did, we would have to return them to the caller.
> This comment is confusing. Is it saying that the assumption might not be va
Done


Line 781: 
> nit: many of these blank lines don't help readability and cause less code t
Done


Line 807:       DeleteAllBlocks();
I pulled this out of the condition, because the new invariant is that GetNext() with eos should clear the block vectors, even if all the blocks in the vector are NULL.


Line 812:   } else {
> maybe get rid of this else since L813 is really the start of the case at 81
Moved it to just before the return.


Line 840:       // The var-len data is in the next block, so end this call to GetNext().
> would be good to explain "why" rather than "what". is it so the var length 
Done


Line 853: 
> nit: remove blank line
Done


Line 874:         output_batch->AddBlock(var_len_blocks_[var_len_blocks_index_]);
> couldn't we have attached this var len block above, and so now we're passin
Because we only advance the var-len block when we see a block index > the current block encoded in a pointer. There won't be any encoded block indexes past the last block.


Line 883: 
> nit: remove blank line
Done


Line 904:   RETURN_IF_ERROR(next_block->Pin(&pinned, curr_block, false));
> if there is an error here, is it okay that curr_block is still in *blocks? 
Good point, that was a bug.


Line 1047: tuple
> block containing the last tuple?
Done. Had to reword next sentence a bit.


Line 1050:     index_ = run->num_tuples();
> isn't this done already?
Done


Line 1070: Can't move to block past the end.
> I was confused by this comment until reading the code because it's referrin
Done


Line 1088:   // Can't move to block before the first.
> similarly: When moving before the first tuple, stay at the first block.
Done


Line 1341:   block_mgr_->ClearReservations(block_mgr_client_);
> it'd be best to not do all this buffer management work in the destructor, b
Added Sorter::Close() to match the pattern elsewhere.


Line 1408:   if (block_mgr_->num_free_buffers() < min_buffers_for_merge - blocks_per_run) {
> I don't understand why we shouldn't just always unpin this last run, given 
I agree. See http://gerrit.cloudera.org/#/c/2908/

I deliberately avoided making algorithmic changes to this part of the code in this patch, since I think it was easier to reason about that set of changes independent of this cleanup.

I think you're right about the var_len_copy_block_ case (it holds onto it for longer than needed) but I think we can defer that until the follow-up patch.


Line 1441:     // in the pinned blocks in the single sorted run.
> Is this comment still relevant? What is it trying to tell me?
Yeah we handle resource transfer in Run::GetNext() so this isn't helpful.


Line 1444:       DCHECK_EQ(output_batch->num_rows(), 0);
> what is this DCHECK documenting? why is this important on this path? (also 
You're right, I don't think this is important any more.


Line 1448:     // resource.
> is this comment still relevant given the cleanup to memory management? isn'
Yeah, that's right. Removed the comment.


Line 1456:   merging_runs_.clear();
This invariants around Reset() aren't clear. I added logic here to clean up rows, assuming that GetNext() may not have returned eos before the Reset(). I'm going to talk to Alex to figure out whether this is even allowed.


Line 1504:     Run* merged_run = obj_pool_.Add(
merged_run is a case where we relied on the destructor calling DeleteAllBlocks(), in case we hit an error before adding it to sorted_runs_. I fixed this by adding it immediately to sorted_runs_.


http://gerrit.cloudera.org:8080/#/c/2826/14/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

Line 48: /// Batches of input rows are collected into a sequence of pinned BufferedBlockMgr blocks
I cleaned up this paragraph a little.


Line 60: After the input is consumed, the sorter is left with one or more sorted runs. The
       : /// client calls GetNext(output_batch) to retrieve batches of sorted rows. If there are
       : /// multiple runs, the runs are merged using SortedRunMerger to produce a stream of sorted
       : /// tuples. At least one block per run (two if there are var-length slots) must be pinned
       : /// in memory during a merge, so multiple merges may be necessary if the number of runs is
       : /// too large.
> not your change but this paragraph is kind of confusing because it talks ab
Reorganised to explain it in a more logical order.


Line 77:  final merge)
> again confusing because the comment didn't really explain how merging happe
I think it should be clearer now.


Line 137: Finalizes
> not sure what it means to "finalize" unsorted_run_ from just reading this h
Reworded. I think it's clearer to just that that it's called when there are no more runs to be appended.


Line 199: unsorted
> they become sorted, right? so maybe just say "initial runs"?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2826
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Gerrit-PatchSet: 14
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes