impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-3344: Simplify sorter and document/enforce invariants.
Date Tue, 03 May 2016 23:12:51 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3344: Simplify sorter and document/enforce invariants.

Patch Set 5:


Made the changes and ran the sorter-specific tests locally. Running exhaustive and ASAN asynchronously
to get better coverage, but wanted to respond now while we still had the review fresh in our
File be/src/runtime/

Line 65:         input_row_batch_->TransferResourceOwnership(transfer_batch);
This transfer here didn't handle the need_to_return case properly - in that case it needs
to copy or consume all input data before advancing to the next batch.

Line 70:         RETURN_IF_ERROR(sorted_run_(&input_row_batch_));
> don't we need to transfer ownership for any empty batches? i.e. does L64-66
Yeah that's true, we should handle transfers here, for the case where the block boundary aligns
with the row batch size.

It turns out that there are multiple latent bugs here that were masked by 'deep_copy_input'
being set to true when merging intermediate runs from the sorter.

The old code didn't handle the case correctly either: it advanced to the next block without
giving the caller a chance to copy out resources.
File be/src/runtime/sorted-run-merger.h:

Line 47:   /// zero).
> is that now true with the loop you added?
Yes, the change moved the responsibility of handling empty batches into the SortedRunMerger()
from the Sorter().

It wasn't true before my change.
File be/src/runtime/

Line 48: blocks
> 'blocks'

Line 51: block != NULL
> why are there NULL blocks?
When iterating over the output it sets deleted blocks to NULL (instead of removing them from
the vector and having to update indices).

Line 58: two
> this directly contradicts the first sentence. let's reword that - the first

Line 59: and
> "that may contain"?
Done.I rewrote this paragraph a bit.

Line 61: using a merger
> either delete or reference the actual class name. Otherwise, this isn't ver

Line 90: last
> final

Line 92: Finalize
> How about FinalizeInput to make it clear it happens after consuming input b
I added a comment to the Run class header.

Line 106: The callee (Run)
> This Run

Line 116: Atmost
> At most

Line 116: (2)
> ?

Line 118: all rows in output_batch will have their fixed and var-len data from
        :   /// the same block
> what? doesn't var-len data live in a separate block from fixed?
Yeah, reworded. It will reference at most one of each.

Line 135: //
> ///

Line 143: //
> ///

Line 147:   // Finalize the list of blocks: update counters, delete empty final blocks, and
> ///

Line 157: var_slots
> outdated

Line 169: Status
> Can you comment on when is an error returned, since added indicates whether
Rewrote the comment to be clearer (it was pretty muddled).

Line 188: tuple
> 'tuple'

Line 189: var_len_blocks_index_
> '

Line 190:   /// data is in the next block.
> , in which case tuple is unchanged.

Line 211: int
> const
It doesn't work since I removed the calculation from the initializer list (I thought removing
const was probably the lesser evil compared to having a complicated expression in the initializer).

Line 214: int
> same
See above.

Line 251:   /// allocated block.
> If it was needed, it is deleted in Finalize().
That's not quite true, will clarify.

Line 254: so far
> Does this include the # returned?
Clarified the comment a bit

Line 285: TupleIterator
> Run should be finalized and unsorted, right? Can you state those conditions
It should be finalized. I think whether it's sorted or not is orthogonal to the iterator abstraction.

Line 290: 'run' and 'tuple_size' are passed as arguments to avoid
        :   /// the caller having to have redundant local variables with the same information
        :   /// when using multiple iterators.
> I don't get what this is referring to
Tried to clarify in the comment. The previous abstraction ended up compiling to fairly bad
machine code since it couldn't infer that first->run_ and last->run_ and first->tuple_size_
and last->tuple_size_ were the same things.

Line 305: 
> nit: remove line

Line 379:   void InsertionSort(const TupleIterator& first, const TupleIterator& last);
I remembered a previous discussion with Dan where he suggested renaming 'last' to 'end' to
make it clearer that it's non-inclusive. Will update 'first' to 'begin' to match. I think
it made sense to rename the iterators to 'left' and 'right' in Partition() since they are
actually inclusive there.

Line 399: swap tuple
> nit: swap_tuple

Line 610: UnpinAllBlocks
> If an error is returned, this doesn't return all blocks. I don't know if it
I added a comment in the header comment. There isn't anything we can do if we hit an error
in here (it should only really be stuff like scratch write errors).

Line 647:           if (!added) {
        :             Status status = Status::MemLimitExceeded();
        :             status.AddDetail(Substitute(MEM_ALLOC_FAILED_ERROR_MSG, "variable"));
        :             return status;
        :           }
> TryAddBlock says that calling with UNPIN_PREV should succeed. Is that wrong
Good point. I removed this code and added a DCHECK here and in TryAddBlock to assert the invariant.

Line 739: if it did, we would have to return them to the caller.
> Is it too tricky to add a DCHECK for this? I took a look and guess there's 
I had to add an accessor to RowBatch but it seems worthwhile. It only checks for attached
blocks but that seems ok for this case.

Line 779: // GetNext() fills rows
> More direct:

Line 803: ++var_len_blocks_index_;
        :       end_of_var_len_block_ = true;
> I think it's a little confusing to increment the block_idx and also set a b
I tried this earlier but eos handling in GetNext() got a little more complicated so I gave

I tried a different approach that seems to work out better, although the eos condition is
more complicated.

Line 838:   BufferedBlockMgr::Block* block = (*blocks)[block_index];
        :   BufferedBlockMgr::Block* prev_block = (*blocks)[block_index - 1];
> If you make the change i mentioned on l804, this'd be next_block and the li
Done. Seems to work out slightly cleaner.

Line 859:   for(const SlotDescriptor* string_slot: sort_tuple_desc_->string_slots()) {
> nit missing space after for?

Line 945: DCHECK_EQ(block_offset, 0);
> is this true because this is going to be the first string in the next block

Line 956: May be NULL for
        :     // zero-length strings.
> Please add a quick comment that this case comes from NULL+0=NULL, since NUL

Line 976: index_(index),
        :       tuple_(NULL) {
> nit: 1 line

Line 984: to 
> remove

Line 985:   // block_index_ as if it pointing to the last tuple. Add tuple_size_ bytes to
> it is

Line 1043:   temp_tuple_buffer_ = new uint8_t[tuple_size];
         :   swap_buffer_ = new uint8_t[tuple_size];
> will your later work clean up these untracked allocations?
I think we should eventually move a lot of these small allocations to mempools, but these
are small enough (and we only do them once per exec node) that I'm not too concerned about
doing that right now.

Line 1387: Last
> Why start calling this "Last"? At the time this is called it seems like it'
Changed it to 'Current'. I think it makes sense to have a name that makes it clear that it's

Line 1389:   sorted_data_size_->Add(unsorted_run_->TotalBytes());
> can you move this down after the sort happens?

Line 1394:     RETURN_IF_CANCELLED(state_);
> maybe move this below this scoped block, and the counter incr form l1389 ri
Moved it. I don't think the cancellation checking as-is makes sense. Michael is actually looking
at this soon.

Line 1403: int max_runs_per_final_merge =
         :       block_mgr_->available_allocated_buffers() / blocks_per_run;
> If I follow this (and the rest of the fn) correctly, sorting is more broken

Line 1416: sifficient
> sufficient

Line 1456: true
I discovered that setting deep_copy_input here was masking multiple resource-transfer bugs.
I'm going to leave it unchanged but add a TODO to remove it.

I set it to false and tried running the sort tests, and it seems to work ok, but it seems
safer to leave as-is until we have a simpler resource-transfer story.
File tests/query_test/

Line 121: mean
> means

Line 123: query
> nit: line above this

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I9c619e81fd1b8ac50e257172c8bce101a112b52a
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message