impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Kornacker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks
Date Mon, 08 Aug 2016 21:13:21 GMT
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks
......................................................................


Patch Set 1:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/blocking-join-node.cc
File be/src/exec/blocking-join-node.cc:

Line 195:         &build_side_status));
indent 2 more


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/blocking-join-node.h
File be/src/exec/blocking-join-node.h:

Line 67:   boost::scoped_ptr<RowBatch> build_batch_;
is this only utilized when the join runs in a particular 'mode' (i don't mean the join op)?

given that the pool disappeared and we now have a single batch instead, it sounds like there
was a semantic change or tightening.


Line 122:   /// Set up 'current_probe_row_' to point to the first input row from the left
child
you mean from probe_batch_? i don't see anything in this function that restricts it to first
probe row.

the description of the behavior is almost as long as the function itself, this might be an
indication that it's not really a good abstraction.


Line 202:   void BuildSideThread(RuntimeState* state, DataSink* build_sink,
find a verb for this? ConstructBuildSide?


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/data-sink.h
File be/src/exec/data-sink.h:

Line 108:   /// Owned by the RuntimeState's object pool.
simply 'not owned' is fine (unless there's some special dependency that this class needs to
be aware of).


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 631:     PartitionPair* partition_pair;
call this dummy or something to that effect, to signal that it's not needed


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

Line 194:                   PartitionPair** partition_pair, bool no_more_rows);
odd indentation


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

Line 31:     MemTracker* mem_tracker) : DataSink(row_desc),
move initializer to new line


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/nested-loop-join-builder.h
File be/src/exec/nested-loop-join-builder.h:

Line 35: /// tuple memory, the non-copying mode is used and row batches are simply accumulated
in
meaning that you have to call AddBuildBatch and can't call Send?

if so, add a flag to c'tor that indicates mode and then dcheck for correct mode in interface
functions.


Line 90:   RowBatchList raw_build_batches_;
the meaning of 'raw' wasn't clear to me until after i read the .cc (we typically use it for
unparsed or unformatted data).

it seems like the distinction is between original (referenced) input and copied.


Line 92:   /// List of build batches that were deep copied and are backed by each row batch's
"deep copied out of raw_build_batches_"


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 247:   for (const FilterContext& filter: filters_) {
single line if it fits


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/exec/row-batch-cache.h
File be/src/exec/row-batch-cache.h:

Line 70:   /// GetNextBatch() has not been returned yet.
has not yet been called


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/runtime/plan-fragment-executor.cc
File be/src/runtime/plan-fragment-executor.cc:

Line 65: /// Helper function to log a batch of rows. VLOG_ROW must be enabled.
that seems generally useful. move to RowBatch?


http://gerrit.cloudera.org:8080/#/c/3842/1/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

Line 37: 
?


-- 
To view, visit http://gerrit.cloudera.org:8080/3842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9d7608181eeacfe706a09c1e153d0a3e1ee9b475
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message