impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3286: Prefetching for PHJ probing.
Date Tue, 10 May 2016 23:47:07 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3286: Prefetching for PHJ probing.
......................................................................


Patch Set 2:

(16 comments)

I did an initial pass. I think we need to give the abstractions some more thought. I think
then the code will get simpler and we'll be able to reason better about the correctness of
the changes to the join algorithm.

http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

Line 127: Status HashTableCtx::Init(RuntimeState* state) {
It would be kind of nice if we could just allocate all this stuff from a MemPool (not sure
which MemPool though).


Line 130:   MemTracker* query_mem_tracker = state->query_mem_tracker();
We should account it against the exec node's MemTracker, not the query.


Line 133:   int expected_usage = expr_values_size + expr_values_null_bits_size +
This calculation seems very error-prone and tied in with the allocations below. Is there any
way to restructure it to make it easier to reason about.


Line 134: BitUtil::Ceil(capacity_, 8)
I think the BitMap has a function to do this calculation.


Line 757:   Value* expr_values_buffer_ptr =
I think this works out simpler if you make the buffers external to HashTableContext, since
then these values are just function arguments rather than values we have to pull out from
a hardcoded address.


http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

Line 153:   /// Call to cleanup any resources.
Should document whether it's ok to call this if Init() fails. It's not really clear whether
it is right now.


Line 363:   /// Pointer into 'expr_values_buffer_array_' for the cached evaluated expression
As discussed, I feel pretty strongly that the expression buffering should be a separate abstraction
from the hash table context. I think we should move towards the HashTableContext being stateless.

I also think it would be better to avoid having implicit iterators tied in with the buffers
- let the caller use raw pointers, or provide an Iterator() abstraction


Line 380: indiciate
indicate.


Line 389:   boost::scoped_ptr<Bitmap> ignored_bitmap_;
Can we save an indirection here by just storing the bitmap inline?


Line 453: PrefetchState
PrefetchMode? State makes me thing that this is asome kind of state machine.


http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/hash-table.inline.h
File be/src/exec/hash-table.inline.h:

Line 46: inline void HashTableCtx::PrepareForWrite() {
This assumes a particular usage pattern (linear read/write passes). I think a better abstraction
would let callers also do random access if they wanted to.


http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/partitioned-hash-join-node-ir.cc
File be/src/exec/partitioned-hash-join-node-ir.cc:

Line 260:       ht_ctx->NextRow();
I think this implicit iterator is kind of confusing to follow in here. It's really hard to
see that it's advanced in sync with the probe batch iterator.


Line 342:       if (current_probe_row_ != NULL) {
Could we factor out the ProcessProbeRow stuff into a separate function? It would make it easier
to follow the high-level logic of the function.


Line 386:   } while (current_probe_row_ != NULL && remaining_capacity > 0 &&
status->ok());
It would probably be more readable as a while() loop since the condition would be closer to
the initialisation up the top.


http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 466: prefetch_mode
Is this an enum defined somewhere? It's hard to know what this is or what the values mean.


http://gerrit.cloudera.org:8080/#/c/2959/2/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 251: int
It looks like this comes from an enum, so we should use the enum type instead of int.


-- 
To view, visit http://gerrit.cloudera.org:8080/2959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message